OOCDflBIT BISOHB 



n 090 29« 

AQtlOB 

TITLB 



TH 003 S79 



IflStltOTZOI 



BOBBIO 10 
POB SATE 
COBXBACT 
•ICTB 

If &ILABLE FBOB 



EDBS PBICB; 
DBSCBIPTOBS 



BillMB, ^aaon; Gowiny D. Bob 

0«v«lopB«Bt of CoarM Content Hat«rials Foe Teaining 
B«««accli and Baaaarch Balatad Parsonnal to Appraisa 
BoMarcb Critically* Pinal Bapoct. 
Batioaal Cantar for Bdicational Kasaarch and 
Davalopnant (OBBB/OB), fasliiagtoB, D.c. Begional 
Basaarc^ Progcaa. 
BB-0*90«8 
Jan 73 

OBC-0-70-«775 (520) 
192p. 

ivailabla froa Prantica Ball Inc., Bnglavood Cliffa, 
B.J. as "Appraising Edncatioaal Basaarch: A Caaa 
Study Approach". Pabli8har*a price: %%M 

HP-$0.7S HC-S9.00 PLOS POSTAGE 
•Analytical Criticiaa; Collage stodenta; Contest 
Analysis; Critical Beading s *Bdacational Baaearchers; 
EvalaatioBs •EvaloatiTe Thinking; Oradoata' Students; 
•Instructional Bateriala; Interpretive Beading; 
Literatore Bevievs; Besearch Hethodology; Beatarch 
\ Probleas; Besearch Skills; Technical Beports; 

Theoretical Criticise; '•Training 

ABSTBACT 

A description of the developaent of the print' 
aaterials to iaprove the ability of learners to appraise critically 
ed^Bcational research is provided in this caport. The coapleted 
■aterials consist of the f ollovinig: an introdactory stateaeat ahont 
the natnre of criticisa* a stateaent abont the contents of the 
aaterials and snggestions for ase, and sine case stadias. Boat cases 
consist of a research article, special notes intended to sake the 
article aore coaprehensible# orienting goeations to goide th« 
learner# a "aodel" appraisal (answers to the orienting gnesticne) , 
learner responses, and the product developers* replies to these 
responses. Specifically described in this report is the selection of 
case study aaterials, conduct of the two stages of field testing, 
evaluation of the drafts of the aaterials by stodent users, and bases 
for revision of aaterials. Beproduction of this docnaent ba^s been 
aade froa the best copy available. (Author) 



ERIC 



^ ■ 



;:<0 

o 



TMIS OOCWMEtrt ««Aft ttfW 




■O. OK-0-70-477S<5aO} 



MUSaw «ad O. lob Goi»ia 
YKMk 14«S0 




WfTMBU XBXATBD 

BHflftH^ CRZnCALUT 



CO 

o 



JUM 1973 



0.8. OBMioiiBrr or hswib, Bxicamoii, wo mmK 



Of fie* of UuMtioft 



Mtioaal CantMT for adaeatleMX 




A description of the develcfoent o£ the print nterials to 
iM^TOvt the abllitx of leemers to ^ipralse critically educetioral 
research is provided in this report. The completed naterials con- 
sist of the following: an introductoxy stataaent about tiw nature 
of criticisa, a stateownt about the contents of the aaterials and 
juggestlons for use, end ni;^ case studies. MMt cases consist of 
a research article, special n^ites intended to mice the article aore 
coBpreheadible, orienting questions to guide the leaner, a 'tedel" 
^ppmisal (answers to the orienting questions), learner responses, 
and the product developers' relies to these responses. 

Specifically described in this report is the selection of 
case stu^ naterials, conduct of the tMO stages of field testing, 
eiraluation of the drafts of the aaterials by student users , and 
bases for revision of mterials. 




ERIC 



Pinal Raport 



Project No* fh^B 
CootTACt No, OBC«O-70-477S(520) 



MVBLOmEMT OT OOVRSC OONTENT HMTEUALS 
rOR TMilNINS RBSSMCH MD RB8SMIC8 KBLMBD 
PERSQIMBL TO APPRAISE RBSBUC8 CRXTXCALLY 



Juon Millaan and D. Bob Goirin 

Dmpmrtmmnt of Education 
Ooraall anivitraity 

Ithaca, N«w York 

Juna 1973 



Tha rasaarch raportad harain vaa parfonaad 
purauant to a contract t#ith tha Of fica of 
Edttcatlonf U.S. Dapartaant of Haalth, 
Education # and Halfara* Contractora 
undartakia? such pro j acta undar Co v a rm ant 
aponaorahip ara ancouragad to axpraaa f raaly 
thair profaaaional judgnant in tha conduct 
of tha projact» Pointa of viair or opiniona 
atatad do not# tharafora» nacaaaaerily ra- 
praaant official Of fica of Educatim 
poaition or policy. 



0,5. DBPARIMENT CP 
HEALTH, EDUCATION, AND HELPARB 



Off ioa of Education 
National Cantar for Educational l^aaarch and Daval^pnant 



PREFACE 



Wc wish to acknowledge the assistance of the several 
institutions and individuals named herein wtio coq>crat6d in the 
developromt and field testing of these instructional materials* 
We further wish to thank Prentice-Iiall for their willingness tx) 
make these materials coninercially available at low cost* Finally, 
a special thanks goes to Ms. Dorothy Postemack who served as 
technical editor and adb&inistrative Cissistant during most of the 
lifetime of the project. 



Jason Millman and 0. Bob Gowin 
Ithaca, New York ^ 



tmE OF OCKUKTS 

Pufi or 
Appendix 

Preface i 

List of Tables Ill 

Introduction 1 

Methods 2 

Results 11 

Conclusions 18 

Appendices 

Orienting Qiapters I 

Josqihson, Do Grades Stimulate Students to FailureT II 

Kaplan, *llead Start** Experience ai^ the Develoinent of Skills 

and Abilities in Klndersarten Children Ill 

Haclaan, Prediction of Long*Teni Success In Doctoral Work In 

Psychology IV 

Hunkins, The Infi«ience of Analysis and evaluation Questions 

on Achlenronent In Sixth Grade Social Studies V 

Durkin, Children's Concepts of Justice: A Cooparlson «dth 

the Plaget Data VI 

Harris, Effects of Positive Social Relnforcenent on R^ressed 

Crawling of a Nursery School Child VII 

Elkind, Motivation and Creativity: The Context Effect VIII 

Bronars, Tanperlng with Nature In ELeaentaiy School Science .. IX 

Bridges, Effects of Hierarchical Differentiation on Group 

Productivity, Efficiency, and Risk Tskli^ - X 



ERIC 



UST GF TABLES 



Table 1. Sudent Reactions to Hm Interesting They Found the 

Articie^^, j4 

«^ Wortl«Wlmess of the 
Materials 2^ 

Table 3. Student Reactions to the Fairness of the Criticism 15 



ERIC 



iNnvnjcrioN 



Explicit critical appraisal of researdi products has 
been a missing element in the sequence of efforts which trans* 
foms unknowns into knowns and knowns into practical usefulness. 
It was our purpose to develop materials to help train reseazx± 
and rcsearcli related/ perscmnel to appraise research critically. 

Our strategy was to produce materials suitable for use 
either in conjunction witli a research methods course or separate* 
ly and of interest and value to learners regardless of their level 
of sophistication and field of interest within education. The 
articles chosen as case studies , therefore ^ require neither sta* 
tistical sophistication nor expertise in the substantive field 
in order to be comprehensible. Further, the materirds were given 
a degree of responsiveness by printing and responding to learner 
responses most often encountered during field tryouts. 

The procedures by which the materials w^re de^'eloped are 
described in the Methods section of this report. Some student 
evaluations are provided under Results* The actual materials 
have been reproduced in Appendices I through X. 




METHOD 



Surveying Bxisting Guides > Over 50 publishicd sets of 
guides for evaluating the research of others were identified. 
Although may of these sets provided excellent lists of aspects 
to omsidor in evaluating the re-search cf others (e.g., edu- 
cational :si^if ica7K:e) ^ the/ had the conirton failing of not indi 
eating with actual examples the criteria for judging vidtethor an 
exanple^of educati<^al research actually contained to a sati5* 
factory degree the characteristics deccied iraportant. Thus, the 
decision was made to follow the procedures used in this project 
namely, to guide the learner through a critical evaluation of 
specific examples of educational research. — — 

Selecting Articles to be Critically Analyzed . It had 
been our hope to use articles wHTch were deened as ijsportant ex 
aflq>les of educational research by at least one of several audi* 
ences of educaticmal research products. On October 1, 1970 we 
solicited nominations of such research work from the following 
groups of individuals: 

(a) 18 directors of reading programs 
at the gradtiate level selected, usi^g 
a systemnatic sanpling plan, from a 
list published in: Robert M. Wilson, 
Colleges and Universities Offering 
Programs in Reading , The Journal of 
the Readin g Specialist , December » 
15^7, pp. ^65^. 

(b) 38 members of the American School 
Counselor Association who did not have 
a university or college address, se- 
lected, using a systemmatic sampling 
plan, from tiie 1967-68 Directory of 

• Membe rs published hy the American 

* School Counselor Association. 

V 

(c) 66 superintendents of schools se- 
lected, using a systemmatic sai^ling 
plan, from the 1969 Roster of Nteni>ers 
ixiblished by the American Association 
cf School Administrators. 

(d) 65 individuals listed as ^'Additional 
Members through April 26, 1968" of the 
American Educaticmal Research Association 
Special Interest Group titled: Professors 

^ of Educational Research. 



c 



Only IS suggestions of significant research were obtained 
from the 187 individuals^tistcd in (a) through (d) above « Most of 
tliese suggestions were not used because they were not seen as re- 
search, they were too lengthy for the instructional use we had in 
mind* or they were judged as having so little value that the en- 
suing critique would be markedly imbalaiiced in the negative direc- ^— 

tion. ' — 

_ addition, the editors of the seven hroks of Readings 

on Educational Research vdiich comprise the Am^can Educational 
Research Association (AERA) published series (except the editor 
of the readings on res'^jarch methodology) , were written and re- 
quested to send us the table of contents of their individual books. 
It was felt that such lists would represent significant education- 
al research as viewed by the conmittee of research scholars charged 
with tlie responsibility of selecting the entries for each of these 
books of readings. 

Three tables of {contents of the AERA books of readings 
were obtained and one article was selected from these lists. Be- 
cause of our desirfe to have the final selections represent a cross - 
section of areas of research as vrell as of research methodologies » 
^ we decided against making multiple selections from any given book 
of readings. 

The remaining articles were selected by the Project Di- ^ 
sectors. A list of the 10 articles for which initial drafts of 
training materials were written is presented below: 

Edwin M. Bridges, Wayne J. Doyle and 
David J. Mahan, "Effects of Hierarchi- 
cal Differentiation on Group Produc- 
tivity, Efficiency, and Rijsk Taking." > 
Administrative Science Quarterly , 
September 1968, pp. 305-319. 

Joanne Reynolds Bronars, 'Tanpering 
w[ith Nature in Elementary School 
Science." The ^Soucational Forum , 
NoVCTiber 19657 pp. 71-75. 

Dolores Durkin, "Children's Concepts 
of Justice: A Conparison with the 
Piaget Data." Child Dcvcloi^ent , 
1969, pp. 59-67"; 

David Elkind, Joann Deblinger and 
David Adlcr, '"Motivation and Creativ- 
ity: The Context Effect." American 
Educational Research Journal , May 1970, 
pp. 351-357. 

-3- 

\ 



ERLC 



$• Jl Richard Ilackman, Nancy Wiggins 

and Alan R. Bass, "Prediction of 

Long^rmJSuccess^in Itoctoral Work 
— — iirPsychology, Educational and 

Psychological Me^^^^ntTlSraner 

1376, ppT S65-373r 

6. Florence R. Harris, Mai^garet K. 
Johnston, C. Susan Kelley, and 
Montrose M. Wolf, "Effects of ^ ^ 

Positive Reinforcenjcnt on Regressed 
Crawling of a Nursery School Child/* 
Journal Educatiorial Psychology , 
February"y964, pp, 35-41. , 

?• Francis Pi HunBcins, *The Influ- 
ence of Analysis and Evaluation 
Questions on Achievement in Sixth 
Grade Social Studies/' Educational 
Leadership , January 1968, pp. 326- 
351^ ^ 

8. Oiarles H. Josephson^ "Do Grades / 
Stimulate Students to Failure?" . 
Qiicago Schoo ls Journal , December 
1561, pp. 122-127~ 



/ 



/ 



9, Eleanor TCaplan, '••Head Start ^ Ex- 
perience and the Develc^ent of 
Skills and Abilities in Kinder- 
garten Children.*' Graduate 
Research in Education and Re- 
\ lated DiscTplines , i^ril 1956 , 
pp. 4-28. 

10. Henry R. Weinstock and Charles M. 
Peccolo, *'Do Students' Ideas and 
Attitudes Survive Practice Teach- 
ing?*' Elementary School Journal , 
January 1970, pp. 210-218. 



Obtaining Copyright Release . The appraisal materials 
would have little value if the original article being critiqued 
could not be made available at the same time. Th*^ Project Di- 
rectors were unsuccessful in locating Eleanor Kaplan. However, 
her article liad not heen copyrighted. Dolores Dtirkin and Henry 
Weinstock refused to give permission to have thdir articles re- 
piroduced. Pemission to reproduce the other seven articles was 
received. 



-4- 



ERIC 



Because of the severe time-ctmstraints r^cessitated by - 
the roquirancnt to schedule two field tiyouts within a single 
semester, the Project Directors ccninenced preparation of instiuc- 
tional najterials .to accompany the iXirkin, and Weinstock articles 
before negotiating for a coj^ight release. Ilad the Project Di,- v 
rectors realized that such releases would not be forthcoming, they 
would have substituted other articles at the outsets 



/ Nevertheless, the Weinstock article was dropped because 
it' did not secra to work out well. The'^ ProjectJirectors persisted 
^in refining the materials related to the Durkin article because 
technically only the journal's pex^nission (which was granted) was 
requinred and because nine critiques were contracted with the Fed- 
eral government. . \ - 

Obtiinin^ Reviews of Experts ^ Approximately two dozen 
scholars were invited to prepare a critical appraisal of one of 
tiie articles v^ich matdied their area of expertise* The names of 
the individuals who prepjured such reviavs and the articles they 
reviewed are listed below: / 



{ 



BRIDGES ARTICLE 



BR(3NARS ARTICLE 



DURKIN ARTICLJB 



Dr. V/. W. Charters, Jr., Univfersity of Oregon 
Dr. Robert Ennis, Iftiiversity of Illinois 

Dr. John Easley, Universit^ of Illinois 
Dr. Jonas Soltis, Columbia Univeiisity v 

Dr. Alfred Baldwin, Cornell University 
Dr. Bri;m Crittenden,, Ontario Institute for ^ 
Studies in Education > ' 



ERIC 



ELKIM) ARTICLE. 

HARRIS ARTICLE 

KACKMAN ARTICLE 
HUNKINS ARTICLE 



Dr. J. P. Guilford, Beverly Hills, California 
Dr. Kenneth Strike, Universi^ of Wisconsin 

Dr. Alberta Siegal , x^tanf ord University 

Dr. Harold Stevenson i University of Mirmesota 



Dr. Leonard Krimeman, University of Connecticut 



Dr.-..K(M)neff»Hopkins, Univ(|rsity of Colorado ! 
Dr^-'lfilli^ Lowe, University of Rochester 

JOSEPHSON ARTICLE: Dr. D^id Farr, Sta|:e University of York 

at/Buffaio v • 
Dr/^ohn Milholland, University of Michigan 
•• ■ / . ' , ■ , 

KAPLAN ARTICLE : /Dr. Gene Glass, University of Colorado 

WEINSTOCK ARTICI^: Dr. George Newsome, Jr.. University of Georgia 

fr. William Gephart, Phi Delta Kappa 



/ 



Preparing Initial Drafts of the InstrJctional Materials . 
Armed with tlic reviews of the experts as well as reviews written 
by students at Cornell University, the Project Directors prepar^ed 
initial drafts of instructional materials. to accompany each of the 
ten articles. 

Ho single format was adopted but rather the materials 
were developed in a way which seened most q^prc^riate for the 
article in question. In this first draft phase, students were 
given instructions to first read the article to be evaluated. 
In nost cases these initial instructions were followed by a sec* 
tion of '**Special Notes" i^ich explained terms or procedures likely 
to be unfamiliar to education students (e.g. , use of various sta* 
tistical measures, definition. of unusual tezms, explanation of pre* 
vious relevant literature). Following these "Special Notes'* the 
materials requested students to complete a written assignment which 
required focusing critically upon the article. In some cases the 
assignment required an overall general appraisal in which students 
were to cite strengths as well as weaknesses; in seme cases specif* 
ic questions about the article were devised which students were 
then to answer; in a few cases both kinds of assessments were re* 
quested. Of course in all these cases, the assignments were devel* 
oped with an eye to having \the student consider and evaluate all 
aspects of a given research paper. 

*Moder* answers to the various requests for gmeral ap« 
praisal or sets of specific questions wis the last section of these 
written materials included in the first draft. These answers were 
made fairly detailed %dth the idea that students would compare 
their answers to the ^'mcxlel** answers provided ai^ thereby obtain 
valuable instruction in how to approach educational research papers. 

Securing Field Tryout Populations . It was felt essential 
for a high quality product that the materials be field tested at 
least twice; once ^ter the initial draft and once after a revised 
draft. At the time that the AERA Special Interest Group was solic* 
ited for si^estions of research articles, a request was made to 
use students in their research metliods classes as a tryout popu* 
lation. Fourteen responses were received. Since this mmfccr was 
felt to be inadequate for both tryout phases, an additimal 12 
instructors willing to participate in the testing of the materials 
were recruited at the annual National Symposium of Professors of 
Educational Research held in November 1970 in St. Louis. An addi- 
tional three instructors who were personal friends of a Project 
Director agreed to cooperate. All materials were also field tested 
in classes at Cornell University. 

Conducting First Field Testing . IXiring February and March 
of 1971 the initial drafts of the materials were sent to twelve 
institutions. The names of the institutions and the number of 




student responses returned are listed below: 







Returned 


nDTTY?CC • 


L« rV« rOSt 


TT" 




St. Louis University 






University of New Mexico 


? 


BRONARS 


Stanford University 


0 




University or Wisconsin 


u 




Uhiversity of Maryland 


31 




Kansas State 


8 




University of Nevada at Reno 


0 


1JIJKM.N • 


university or Maryxanci 


T fi 

io 




Kansas State University 


8 




University of Nevada at Reno 


0 




Stanford University 


2 


ELKD© 


University of Washington 


14 




University of Maryland 


27 




University of Nevada at Reno 


2 




Stanford University 


1 
I 


HAOMAN 


C. W. Post 


37 




Purdie University 


11 




University of Wisconsin 


0 


KARRIS : 


C. W. Post 


36 




St. Louis University 


23 




University of Wisconsin 


2 


HUNKINS : 


C. W. Post 


10 




St. Louis University 


8 




University of New Mexico 


7 




University ox Wisconsin 


1 


JOSEPHSON : 


St. Louis University 


13 




Towson State 


30 




University of Wisconsin 


1 


KAPLAN 


St. Louis University 


8 




Towson State 


32 




Purdue University 


9 


WEINSTOCK : 


Towson State 


32 



The instructors were askeid to withhold distribution of the 
'toder* answrs until written responses of their students were re- 
ceived. The reason for this request was to obtain reactions of stu- 



ERLC 



dents irfiich %#ere uncontaminated by these 'hpodeV* answers and yhich 
could serve as data upon i^ch to base future revisions of the 
naterials. 

Preparing Second Drafts of the Instnicticnal Materials . 
Student responses to the initial 'Srslts were carefully reviewed 
before developing second drafts of the materials. These re- 
sponses were most helpful in indicating j^) vAiere there were am* 
biguities in the questions and 'Vaodel" aiiswers; (b) %4iere it 
sacned necessary to cue students more specifically to the de- 
sired focus; and (c) where it seemed wise to o&er further supple^ 
mentary explanation in certain areas. Not only were the materials 
revised with an eye to clarification and anplification, but in the 
cases of the Bridges and Elkind articles, additional sections were 
developed. Student Responses to Question and Our Rqplies, to handle 
the many points which student responses indicated needed explana- 
tion. 

Conducting Second Field Testing. Durii« April and May 
of 1971 the second drafts of the materials were sent to the 
institutions listed below: 

Returned 



BRIDGES University of Colorado W 

Catholic University of America 8 

Mmtclair State College 0 

Arizona State University 9 

BRCNARS Catholic University of America 9 

University of Northern Iowa 28 

George Washington University IS 

Arizona State University 10 

University of Southwestern Louisiana 2 

OURKIN William and Mary 22 

Catholic University of America 0 

University of Southwestern Louisiana 3 

George Washington University 15 

University of Bridgeport 0 

ELKIND : Eastern Kentucky University 14 

University of Northern Iowa 59 

University of Bridgeport 0 

Arizona State University 15 

University of Southwestern Louisiana 2 

HACKMAN University of Colorado 20 

Montclair State College 11 

University of Louisville 21 

Arizona State University 0 

University of Southwestern Louisiana 1 



*The University of Southwestern Louisiana and Texas were sent 
the materials during July 1971. 

•8- 



Returned 



HARRIS : 


University of Southern Califoznia 
University of Wisconsin at Milwaukee 
Ohio State 

Uhiversity of New Mexico 
University of Southwestem Louisiana 


0 
16 
11 
0 
2 


HUNKINS 


University of Georgia 
Pennsylvania State University 
Creighton University 
University of Wisconsin at Milwaukee 
Ohio State 

University of Southwestern Louisiana 


13 
2 
16 
11 
11 
2 


KAPLAN 


University of Southwestern Louisiana 
Texas Tech 

Eastern Kentucky University 
University of Northern Iowa 
George Washington University 


2 
1 
14 
28 
16 


J06EPHS0N : 


University of Southern California 
Creighton University 
University of Wisconsin at Milwaukee 
Ohio Stape-" 

University of Southwestern Louisiana 


0 
13 

5 
16 

2 


WEINSTOCK : 


IMiversity of Southern California 
University of Wisconsin at Milwaukee 
Ohio State 
Texas Tech 

University of Southwestern Louisiana 


0 
6 
13 
1 
1 



In this second draft phase students were asked not only 
to oonplete the written assignroents^ but also to evraluate our 
instructional materials as well. Specifically » they were asked 
to caqiare their answers to the '"moder* answers provided, indi- 
cating where they felt these answers were ainbiguous, incomplete 
or in error. They were further asked for a general evaluation 
of the article and its acconipanyijig materials. In this re|B^rd»- 
students were asked to indicate whether they thought the artic|e 
and the materials were interesting, dealing with a topic import 
tant to the field; Whether the time spent was worthwhile; and/ 
%4)ether the materials were indeed self instructional • Naturally 
the responses varied from paper to «paper and a discussion of ^ 
these responses will be included in the results section of this 
report. 

Soliciting Authors ' Responses to the Materials . A copy 



! 



of the second draft materials developed for his or her p^r Vias 
sent for ccDinents to eight senior authors v*k) could be located. 
Responses were received frcxn all eight. 

Soliciting Experts^ Coppents on the Materials . A copy 
of the second draft materials developed for ^ach article was 
also sent to tt^ appropriate experts %Aio had prepared an initial 
review of these articles. Each expert was asked to read the 
instructicmal materials anl to indicate idiere he felt we had 
made serious errofrs or omissions in our interpretation of the 
article. Virtually no criticism was received. 

Supner Field Test . In addition to the other schools 
listed in this report, two cc^ies of each of the papers and instruc* 
ticnal materials were sent to the University of Southwestern 
Louisiana and Texas Technical College durii^ July 1971. A total 
of 19 papers were returned. 

Pre paring Third Drafts. Using all the materials gathered 
during the first 10 mcmths of this project » final drafts of the 
materials were developed during the fall of 1971. In at least 
three of the case studies, the authors* own reactions were incor* 
porated as direct quotes in the model answers. No third draft of 
the Ifeinstock article was prepared. 

Securing a Copnerical Publisher. During 1972 » negoti-^ 
ation with Prentice*Hall to publish the materials was successfully 
conpleted. 

Preparing the Final Draft . During February 1973, two 
orienting chspters were written to facilitate use of all the 
articles as a collection. Based on cooiaents received by Prentice- 
Hall editor. Gene V Glass, minor changes were made to the orient- 
ing chi^ters and to all third draft materials with the excqption 
of the Durkin article. It is expected that the two orienting 
chapters and eight case studies (Durkin and Weinstock articles 
excluded) will be published as a paperback during spring 1974. 



-10- 



RESULTS 



Enclosed in ^jpendices I through X are tiie Hiird drafts 
of the instructional naterials to accoBnany nine case studies 
(Neinstock and Peccolo article ei^uded) and the tNO orienting 
diapters. These, of course, are the principal results of this 
instructional material devel<^aent project, the/ are the ter- 
sdjial contracted product. The responses of students to earlier 
drafts of the materials t«ere the prinaxy data Mhidi stimulated 
revision of the materials. 

\ i 

Contqit Reylsions of the Materials, The mrisions in 
content nade f rem the first toHSst drafts of the naterials 
can be categorized as follows. 

\ 1. Questions wsxe designed to provide more adequately 
cues \f or the desired response. Early in the developnent effort, 
it bequne obvious to the Project Directors (i.e., project devel* 
opers) that the users of the materials were interpret^ the 
questions differently frcn initial intention. Students wuld 
ccoplain that their ansviers differed from the model ^ipraisals, 
not because they %tferen*t able to say the kinds of thirds given 
as answers, but because they didn*t realize «Aat was wianteid. 

The most frequent change to questions was to add to 
questions which could be answered solely by "yes*' or such 
phrases as, **e3q>lain why you asnwered as you did," "state 
reasons for your answer," and '\At/7'\ In many cases, the 
questions were made longer in order to coraaunicate more clearly 
tht intended direction the answers should take. Hbrds were de* 
fined, limits set, orienting statements made, and cautions 
about possible misinterpretations provided. 

j 

2. Reasons for dogmatic^sounding statements were 
provided. Frequently the model answers contained statements 
which to the Project Directors were simple statements q£ fact 
but lidiich were challenged by the student readers as dogmatic* 
In such cases, reasons were added to sxxpport the claimibeing 
mode. 

I 

3. Inferences about feelings were reduced, ^rlier 
versions of the materials contained many inferences regarding 
the feelings of the research investigators. Student readers 
were critical of the Project Directors* willingness to fitate 
how other people, namely the research investigators, fcilt. 
Students were also critical of some of the personal opinions 
of the Project Directors. Although these feelings were not 



-11- 




always cliiainat/dd, reasons were provided in support of the con- 
victions being (^pressed. ^ 

4. Greater explanation was provided. The student 
answers to earlier drafts of the materials highlighted sections 
of the materials where greater explanation was needed. Hard- 
to-understaxid concepts were clarified « 

5. Questions and sections in the Model Answers were 
eliJBinated. Reactions such as **duab question** convinced the , 
Project Directors tiiat soae naterial wes less iafiortant than 
others. Further^ other questions and connents were so hard 

or technical that few students were able to answer then or 
understand the explanations given. In nost of sudi instances , 
the material was elifldnated. 

6. Mistakes were corrected. Errors in the, earlier 
drafts ranged from sinple tjrpographical mistakes to a few 
real blunders on th6 part of the Project Directors. In 
addition to revisions resulting f ron sudi clear*cut ierrors » 
a very large mnber of changes were nade not because one 
wrding was right and the other wrong, but |>ecBUse one 
wording was aKnre appr op ri ate than the other. For e9C88|)le, 
in the Elkind et al. article^ the issue was raised whether 
the research instmaents were valid measures of the oon* 
str\x:t» ^'creativity**. Ihe original wording was» cre- 
ativity is not really the depmdent variable,...** and this 
was changed to **.. .creativity is not really measured by the 
tests,...** 

7. Different e3q)ressions were used to esqiress the 
same ideas. Very often the student answers were better ex- 
pressed than those provided in the Model. In such cases, 
such wording was sid>stituted. Further, connents of the 
researdi investigators themselves were sco^times substi- 
tuted for similar comments m^ by the Project Directors 
because users of the materials esqpressed an interest in 
loKwing how the Research Investigators felt. 

8. Consents were qualified. Blanicet statements 
were often altered for the sake of accuracy. For exanople, 
*There is no..." was changed to **We know of no...*' and 
**should** to **might**. 

9. Student response sections were created. Fre- 
quently occurring or interesting coonents of the student 
users were added to the model i^raisals together with re- 
plies from the Project Directors. These sections were very 
well received by later users. 

-12- 

ERLC 



Finally, it should be pointed out that the eight points 
listed on pages 14*16 in Appendix I resulted directly from can* 
nents made by the users of the initiai drafts of tlie materials. 



General Evaluation of the Materials by Sfaidents. During 
the second fielS trials, stu3aits were aslced the xolioKing: 

Please give us a general evaluation of 
these inaterials. Specifically, coranent 
on: 

(a) how interesting you found the 
article; 

(b) lather you felt the time you 
spent working on these mate* 
rials was worthirfiile; 

(c) whether you can think of other 
aspects of the study you wished 
we had coraented upon; 

(d) whetlier you think we were too 
hard or too easy on the investi* 
gator (can you give specifics?) ; 
and 

(e) any other coRinents you might wish 
to make. 

A suimary of the students' reactions follow. 

Presented in Table 1 is a tally of tlie perceived 
interest-producing quality of the articles used as case study 
material. As can be seen in Table 1, most respondents found 
the articles interesting oi very interesting. Less than 101 
rated the articles as not interesting. This trend appears for 
each of the nine articles. 

The 31 responses coded, "Irrelevant ComneRt'* dealt 
with other than the \nterest*producing qualities of the articles. 
Exaoples of coanents included in this category are: "enjoyed 
reading it," "thought provoking," "too confusing," and "very 
infoxmative."* 

The assessment materials were judged to be wortMiile 
or very worthwhile by 207 respondents as shoivn in Table 2. 



^Responses of students from George Washington University to 
the Bronars and Ourkin article were inadvertently omitted from 
Tables 1-3. 



•13- 



ERIC 



Table 1 

Student Reactions to How Interesting They Found the Article 



Response Category 

Senior Very Not Irrelevant No 

Author Interesting Interesting Interesting CoBwent Comaent 



uriiCigcs 


0 


11 


4 


o Z 


Ktxnars 


17 


16 


2 


2 12 


Durkin 


11 


9 


O 3 


0 2 


Elkind 


23 


29 


8 


S 25 


Hacknan 


12 


22 


7 


3 f 


tiarris 


6 


16 


2 


3 2 


Hunkiru 


10 


20 


6 


S 14 


Joscpttsoci 


14 


11 


1 


4 6 


Kaplan 


20 


25 


4 


6 6 


Total 


119 


159 


37 


31 78 


Althou^ sane differences exist, the balance of %#Drt}a4iile over 
not vporthwhile assessments is maintained for each article. Ihe 
materials related to the article by Hjoplan were especially well 
received. 








Table 2 




Student Reactions of the Worthwhilehess of the Materials 

(Second Draft) ^ 




Senior Very 
Author Worthrfiile 


Not 

Iforthi^ule IVbrthMhile 


Irrelevant No 
Ccnment Ccmnent 


Bridges 


2 


11 


5 


3 5 


Bronars 


4 


17 


7 


11 10 


Durkin 


4 


8 


4 


8 1 


Elkind 


5 


34 


13 


15 23 


Haclonan 


2 


21 


11 


6 13 


Harris 


2 


11 


6 




Hunkins 


5 


20 


7 


6 17 


Josephson 


4 


IS 


5 


3 9 


Kaplan 


7 


35 


2 


8 9 


Total 


35 


172 


60 


68. 89 



-14- 

ERIC 



Goonefits listed as **trrelevaitt** to the MorthMhileness 
chancteristic inrludcd the £oUoiiii«: "did not Bind doiqg this/' 
*'it produced food tor thought/* "took too long but was a ^od 
oqposeV "it was infomative/* *'thd article was worthwhile /' ''did 
not understand the puzpose/* "I enjp^ the experience/' and "the 
experience was certainly educational/' 

It should be kept in nind that different colleges are 
represented for the several articles. The recqption that a 
particular article received seemed to depend in part on how^it 
was introduced by the instnictor. As will be indicated shortlx» 
several students did not understand the purpose of the exercises 
and resented the time it took away from "required" course %iork. 

The third general evaluation question asked ^Aether the 
student could think of other aq;>ects of the study they wished we 
had cconented upon. Some of the answers to these questions were 
useful in preparing the third drafts of the materials. Since 
they dealt with content specific to the individual article » they 
are not simnarized here. 

Table 5 contains a sunnary of responses to the query 

Table 3 

\ Student Reactions to the Fairness of the Criticism 
(Second Draft) 



Senior V 


Too 


Too 






No 


Author 


Easy 


Hard 


Neither 


Irrelevant* 


CoBBient 


Bridges 


1 


1 


9 


4 


11 


Bronars 


6 


S 


13 


6(2) 


19 


DuTldn 


0 


7 


7 


7(4) 




Elkind 


3 


10 


18 


29(10) 


30 


Hackaan 


6 


3 


16 


12(5) 


16 


Harris 


5 


4 


11 


4 


S 


Hwikins 


2 


6 


17 


6(2) 


24 


Josephson 


5 


1 


14 


2 


14 


Kaplan 


5 


. 11 


20 


11(5) 


14 


Total 


33 


48 


125 


81(28) 


137 



^Values in parentheses indicate the nunber of responses coded ir- 
relevant «Aich suggested the assessaent was either not too hard or 
the criticisms were well siqiported. 



•15- 



ERIC 



^tether vie were too. hard or too easy on the investigators. AI* 
tfaoMgh the question mis worded in a fbrced*chDice fonnat, we were 
pleased 125 respondents answered neither too easy nor too hard 
and that those nAio picked a direction were roughly split between 
the too easy -and too haxd poles. ' 

There were some differences anong articles in ws^s %^ch 
«e could have predicted. Our assessment was seen as too easy for 
the Josephson article, and this was sonei4iat intentional for two 
reasons!^ As the anticipated lead case stuty in the published col 
lectioQ, we wanted to iflf}ress upon readers that all assessments 
need not be negative. IHirther, although tiie outward appearance 
of the article was diat it was terribly naive (there were seme 
glaring weaknesses) , we wished to stress some basic stre^gtiis 
idiich we expected would. be overlooked. 

The Durkin review was seen as too harsh. This reaction 
vas not uneaqpected either^ because although the article looked 
sophisticated on the surface we had some serious questions about 
the educational significance of the researdi being reported. It 
is possible that ^ author herself felt that we were being too 
harsh because she refused to give approval for us to reprint the 
article. 

^^Connents coded "irrelevant" to the question included, 

"analysis^ good/' "^iproach highly professional/! and "I gener- 
ally ^gree with the conoents." As indicated in the footnote of 
Table 3, 28 students answered that we were not ,too hard or that 
our criticisos were well siqsported. Sob» readers of this re** 
port jssacf prefer to consider these 28 responses under the head* 
ing ••neither*'. 

The last general evaluation question merely asked the 
respondents to i&aH any other coonents they wished. These com* 
ments could be grouped into several categories. 

First, there were snany ncfficontent related negative 
jections. Many students saw the assignment as an infringement 
on their tisie, especially scarce since the schcx)l year was al- 
most over. Other students ii^e not clear on the purpose of the 
assignment, and were we to do it over, we would hav^ given much 
more explanation on this uatter. Rorther, the article, the 
special notes, the questions, the model answers, and the stu- 
dent responses and answers were usually distributed separately. 
A large' nuiber of students reacted negatively to this paper 
shuffling chore* In the published version, all materials will 
be bound together. 



-16- 



A second goienl catagoxy of responses were negative 
content related cowmts dealing aost often idth the long aaunt 
of tiM needed to study and critique carefijlly a particular arti- 
cle. Other negative connents dealt with the article itself (not 
, in the field of interest of the person) » questi on ed the inportance 
of the appraisal activity, or stated the naterials were too hard . 



A third group of coments were positive in which im- 
proved skills and quali^ of the naterials were nost frequently 
BMitioned. Of the 226 coanents, 3SI were in this categoiy; SOt 
in categories ane^f two; 71 in both the positive and one of the 
tHO negative categories; and 8t in none of the three categories. 




o 



-17- 



OONCLUSICWS ^ 
\ 

A set of nine case Studies and two orienting chapters were 
prepared. The loaterials underspent marked revisions as a result. of 
the field txyouts. As gleaned froa the student reactions » the arti- 
cles viere judged interesting and the i^ypraisal iQa:terials worthnhile. 
The Project Directors are left with three salioit inpressicns about 
the developoent effort axKi the oaterials. 

Firsts producing the appraisals was very hard. work. It 
taxed the scholarship of the Project Directors greatly. Contraiy 
to expectation^ it was found that little of the work could be dele* 
gated and, consequently, the Project Directors had to assume respon- 
sibilities previously thou^t vould be assumed by bright graduate 
students. 

Second, general principles of research ^raisal did not 
energe. Each article seemed to generate its own unique points of* 
criticism. The Project Directors are convinced now more than ever ^ 
that checklists for researdi appraisal must, by their very nature, 
be too shallow to provide the deptii of assessment evident in the 
present materials. Ihe important tools for tiie successfiil critic 
would qjpear to be strategies for handling $he aiqpraisal task and 
subject matter and methodolc^ content. The true value of the 
case studies may be both in reinforcing remiisite **habits^of 
workmanship""Strat^ies like reading carenilly, i>erceiving the 
ccopromises in design, searching for significance, etc.*-and in 
providing concepts and facts for the learner to use. . 

<« 

Third, trainers of research' itorkers familiar with the 
materials have found than appropriate for their own teaching • 
For example, students at Cornell who now have college teaching 
positions are employing the materials in their classes; the 
Prentice*Hall editor who proyiiied the expert review has adopted 
the materials for his clas^. , 

There remains the question whether the instructional 
materials actually isiprove the appraisal skills of the users 
and, in the long lun, has inpact on educational knowledge, policy 
and practice. Altlioug^ not part of the present contract, the 
Project Directors hope that such terminal evaluations will be 
forthcoadng. By putting these materials in the public domain,, 
both as part of this report and in a comnercially distributed 
version, others %dll have an opportunity to evaluate the pro* 
duct in teras of their own concerns and standards. 



•18- 



ERiC i 



pf'-/ 



/ /, ■ ■ ' ' ■ '■ - • 

./ Pq^kdix I / 

V--' - Page • o 

Preface .to the t:ollection ; 2 ■ ' % 

Orienting Chapter 1 • 4 ' 

Orienting Ch^ter 2 12 i 

' * _ . ^ 'I 




/ 



ERIC 



Preface 

A surprising fact to us Is that the tradition of critical appraisal 
Is so largely missing In the context of educational research. Very little 
good criticism of educational research occurs. Uhy? 

Perhaps It Is a matter of assumed senatorial courtesy or that 
the best critlclstii of other research Is simply doing a superior piece of 
research^ or that since all in education are dedicated good people one 
should not be critical, or that a sharp*eyed critic Is a dangerous fallow 
because he will efnbarass a naive or foolish empiricist, or possibly that 
research Is done to achieve tenure, promotion. Increase In slary, prestige, 
esteem, more grants, etc. These reasons and others much gossiped about 
at research meetings do not really concern us. Ue think both research 
and criticism are matter:; that Intelligent students can be expertly tralred 
to do well, and v/c sec no reason not to try to Improve present practices. . 
One need not be afraid of criticism. 

Very little gcod material is available as Instruction In critlcisfn. 
Ue do think that gocd criticism is needed in education and this fact has 
led us to put forth the effort recorded here. 

This book has ci/rare characteristic; the nine critiques which 
comprise the main contribution have been extensively field tested. That 
is, the critiques haver been d3veloped and modified on the basis of conments 
supplied ty over 800 itudcnts from 27 colleges and universities. Indispen- 
sible to us were the reactions of subject matter experts and of students 
participating in the successive field tryouts. (Point A - See p. 3). The 
cycle of field test, modify, field test, modify...penn1tted the naterlals 
to achieve a level) of quality not possible otherwise. 

Acknov/ledgment of the help of several groups are in order. Speci- 
fically, we owe much to: 



1. the students and cooperating professors of the following 
Institutions: Arizona State University, C. H. Post, Catholic University of 
America, Cornell University, Crelghton University, Eastern Kentucky 
University, George llashlngton University^ Kansas State University^ Mont* 
clair State College, Ohio State University, Pennsylvania State University, 
Purdue University, St« Louis University, Stanford University, Towson State, 
University of Colorado, University of Georgia, University of Louisville, 
University of Maryland, University of Nevada at Reno, University of New 
Mexico, University of Northern Iowa, University of S. W. Louisiana, Univer- 
sity of Washington, University of Wisconsin at ftadlson. University of 
Wisconsin at Milwaukee, and lit 11 1am and Mary. 

2. the following scholars who supplied us with an initial reaction 
to one of the articles: David Farr and John Milholland (Chapter 3), Gene 
Glass (Chapter 4), Leonard Krimerman (Chapter 5), Kenneth Hopkins and 
Uilliam Lowe (Chapter 6), Alfred Baldwin and Brian Crittenden (Chapter 7), 
Alberta Siegal and Harold Stevenson (Chapter 8), J. P. Guilford and 
Kenneth Strike (Chapter 9), John Easlcy and Jonas Soltis (Chapter 10), 

and H. H. Charters, Jr. and Robert Ennis (Chapter 11) J 

3. the following profess ionally-minded investigators , who offered 
constructive reactions to our critiques of or further information about 
their research articles: Edwin M. Bridges, Joanne Reynolds Bronars, Wayne 
Doyle, David Elkind, J. Richard Hackman, Charles H, Josephson and David 
Mahan . ^ 

4. the publishers and investigators ifho were willing to grant 
permission to reproduce their articles despite the presence of negative 
comnents. 

5* the United States Office of Education which provided the 
financial support for developrrent of these materialti. 

6. Prentico Hall for maklnr; it possible for these training 
materials to be diiseniinated. 



Acknowledgment of the assistance of the scholars and investigators 
names above should not be construed to mean that they approve of all 
aspects of our appraisals. 



ERLC 



V 



-4- 

CHAPTER 1 
The Nature of Criticism 

There is a ser\se in which the critical aopralsal of empirical 
research papers Is also an act of research. It is an act of research 
because the critic reviews each of the aspects of the research paper very 
nuch in the same way as the original author considers aspects of the 
research paper. The key elements In the pattern of inquiry are the same 
for both the doing of research and the doing of criticism. In each case 
one must take a look at these elements: the nature of the problem, the 
phenomena of Interest, the telling question, the key concepts, the methods 
of worlt» the knowledge claims and other products of the^ research effort, 
and the value or significance of the research. 

The act of critical appraisal is a. process of analysis, of breaking 
down and taking apart, what was produced by an act of synthesis by the 
original author(s). There Is another pair of eyes, another mind, another 
point of view about the research. Specific Jtraining in criticism will in 
the long run enhance the fertility of actual research. 

Each element in the pattern of inquiry requires the Investigator 
to select, arrange, modify, and Interpret. This process requires judg- 
ments. For example, the selection of a phenomena of Interest and from that 
the settinq up of a problem Involve a judqment that these aspects of the 
world of experience are worth Inquiring Into, Implicitly rejecting other 
concerns that might be worked on. The precise form of the telling question 
Is a judgment that this question and not some other question will enable 
the researcher to find out something of Importance. The use of one set of 
key concepts to ask the telling question means that other concepts have been 
thought about and rejected for the time being. The research design, the 



ERIC 



-5- 

selectlon of specific techniques of data gathering, statistical analysis, 
the construction of tables and graphs and other ways of presenting thi! 
record of the research effort, employ the judgment that these methods are 
better than others that might be used In this case. Finally, the parti- 
cular knowledge claims selected as the Important ones, the conclusions 

that are Interpreted by the researchers, signify yet another set of-icaaplax^ 

Judgments about what Is worth reporting and what Is seen as having value to 
other researchers. 

The main point to be made hare Is that the categories of critical 
appraisal are basically no different frcr^ the categories of actual Inquiry. 
The critic should question the Judgments inade at each stage of the pattern 
of Inquiry. Specifically, he should ask: VJhat other phenomena of interest 
might be relevant? What other way to pose the problem could be thought 
of? Vihat different concepts or conceptual systems might have been used? 
What alternative designs or methods or techniques for daU gathering could 
liave been considered? llhat limits to generalizations are found in the 
particular way thQ research is reported? l^at other values might conceiv- 
ably be found in this research? And critical appraisal , like worUwhiic 
research, depends ^eavily upon human Judgnent. 

Three Purposes of Criticism 

First, to the extent that research is an attempt to establish the 
fundamental and foundational knowledge claims about education, criticism Is 
the attempt to apply the best human thought to test these foundations. 
Whether the research effort is directed at aptitude testing, behavior modi- 
fication, organizational change, instructional material development, nothing 
of consequence follows if the research is faulty. A science builds upon 
its foundations, and confidence is a result of a tested faith in those 
foundations. Further, because research is open ended, criticism can point 



ERIC 



to avenues of additional research needed to solidify our foundations of 
knowledc^ 

The second aim of criticism concerns policy-making and Inple- 
nentatlon. Policies are complex J udgmnts, based partly upon facts and 
knowledge claims and partly upon values and «alue Judcpents. Policies 
are plans for action. To educate, to intervene In the lives of other 
human beings are serious moral undertakings. If a lack of knoMledge Is 
allowed to persist where knowledge could be obtained, the policy lude and 
the action undertaken are grossly negligent of concern for the moral worth 
of other people. Criticism has a special role In policy analysis because 
It makes explicit this relation between knowledge and value found In 
educational policies. 

The third aim of criticism concerns educational practice. ,tn 
spite of rhetorical claims to the contrary, research has had little effect 
upon educational practice. Because there Is always the potentiality that ' 
research will be conceived so as to change practice, criticism mist obtain 
here too. The taking of thought to Improve practice can lead to finding 
out facts, to discovering relations, to solving problems, to dispelling the 
comforting but misleading .conventional wisdom. Criticism can be appMed 
directly to the problems of Justifying educational practice, but It ties 
up with research when it suggests the role of research In making practice 
more efficient, more effective, more humaru!, more insl^tful In Its com- 
plex operation. 

Criticism and Literary Criticisni 

Me find it useful to borrow from the field of literary criticism 
a set of distinctidns we think apply to criticism of educational research. 
Literary critics distinguish aspects of criticism into four elements: the 
author (or artist), the work, the audience, and the universe.* These dis- 
tinctions are useful because we find that Importantly different criteria 



of assessment apply to different elenents. For example, when we evaluate 
research we can beqin a"Crft+c4sm by checking the authority of the author 
and we^lve the reasons for saying that the author or authors are experts 
th the area of research. Indlvlduals'with a history of high quality research 
Justifiably deser/e our attention because they have over the course of 
years earned the label of expert. Experts are In a sense highly calibrated 
Instruments; we trust tneir "readings," the points they make. Of course, 
any person Is fallible; experts hav^ their off day, busy people make mis- 
takes and so on. Nevertheless experts continue to deserve the label as they 
continue to employ high standards for their work. 

Many judges of research papers (editors of journals for example) 
make a practice of not knowing the name of the author. This practice Is 
one my to force attention to the work Itself. Criteria of excellence 
commonly applied to indlvldtual v/orks are very familiar: coherence of the 
reasoning from the problem statement to the conclusion, justlflcatltm of 
the significance of the problem In the context In wfilch It Is placed, 
elegance of the design, choice of techniques of meat^urement, completeness 
of analysis, originality or novelty or creativity (breaking new ground), 
generation of nev/ paradicps as well as connection to older paradigms to 
supply continuity with previous research. 

Literary critics also judge the value of a work of art by the 
effects' it has upon an appropriate audience ; does it entertain, editXt point 
a moral, stimulate applause? Research products are also judged for their 
contribution to individuals who use the research products. 

Does the set" of knowledge claims of the research report stimulate 
consideration of educational changes? There exists the balancing of 
judgment between research that is socially relevant, that solves or contri- 
butes to the solution of an immediate social problem— that, versus research 



-8- 

for which no socially relevant consideration Is relevant because the 
research contributes to the furtherinq of scientific knowledge which at the 
time does not seen to have any social relevance. This comparison Is some- r> 
times referred to as the scientist riding a white horse (to change society) 
versus the scientist wearing his white coat (to contribute to scientific 
knowledge). 

The process of education is necessarily social. And the conser- 
vation and continuity of a social order necessarily requires education. 
Every adult (indeed every human who acquires a language) is educated In a 
social context, whether through formal schooling or not. In a comnon sense 
W2^ every person knows something about education. This coirmon sense know- 
ledge, or conventional wisdom of the audience, often stands In the way of 
establishing scientific knowledge. . 

The fourth element for the focus of criticism Is called by literary 
critics "'the universe." The term we have used for this element In these 
materials Is the "phenomena of interest/* We have in mind here the ••stuff 
the subject-matter, the kind of tMnq the research is about. For example. 
In one of the studies In tnis book the authors are concerned with produc- 
tivity of groups as it relates to the structure of the group. These 
phenomena are of considerable Interest to school principals , industrial 
managers, and others for the reason that adequately anchored knowledge 
claims could provide a valuable guide for the acknlnistrator. In this regard 
the research wuld be judged as potentially significant; we say potentially 
significant here because the significance of such studies are npt achieved 
by'what it is about, but by what. It tells us of what .t is about. In 
other words, the phenomena of Interest can be very Important and the research 
relatively trivial If it fails to penetrate Into the phenomena in any 
successful way. 



ERLC 



-9- 



As suggestive andfruitful as the model for criticism that the 
literary critics use, (and we only sketch it here) it has some shortcomings 
as well, A chief shortcoming is the lack of focus upon methods of work. 
We do not feel that we are going to mislead an audience of educational 
researchers, however, because the omnipresent focus of criticism found in 
contemporary educational research is precisely a concern with methods, 
with techniques of work, with research design, with statistical analysis. 

Old Chestnuts in Dispute 

Sometimes it is held that research is creative, that research 
generates new knowledge about the v/ay the world vjorks. On this ground 
research is distinguished from scholarship (sometimes called library 
research) which only puts together or comments on knowledge which others 
have produced. So, on this ground the products of criticism can be called 
scholarship. Whatever the label agreed on, the relation between research 
and scholarship can be very close. Criticism v/hich reveals faults in pur* 
ported knowledge claims is both creative and valuable* Moreover, as indicated 
previously, both research and criticism (scholarship) require judgments 
about the same processes. 

He have learned much from the college students who helped by 
using early drafts of these critical annraisals. One thing which many 
students reported had inhibited their own appraisal was the lack of knov/- 
ledge about statistics. We urge students, and other critics, to become 
knowledgeable-.about statistics, but we also recommend that one not be too 
easily blinded by statistics. A kind of mindless reverence for numbers, 
tests of coefficients, F ratios and the like is to be avoided. One cdn 
still use judgment to see whether the data analyzed actually relate in a 
satisfactory way to the basic question, and how useful the data are in 



ERIC 



/ 



• 10- 

composing an answer. Any complex statistical analysis can be para- 
phrased in words, and the relations between variables can be interpreted 
in terms of the key concepts and the major knowledge claims. Research 
reports which rely on tests of statistical significance alone to establish 
educational significance are justifiably criticized. 

Many people feel that the best way to criticize research is to 
compare the work against a checklist of possible faults. Many such check- 
lists have been produced.^ Checklists can be valuable, for they serve as 
a reminder of key features of an investigation which should be considered 
in any appraisal . 

Checklists, however, have at least tv;o major shortcomings. First, 
they do not provide the criteria to judge the criteria. On what basis is the 
critic to decide if "the instruments are valid" or "the design appropriate"? 
Such judgments require knowledge of facts, concepts, and research paradigms. 
The model appraisals in this book are often very lengthy precisely because 
we have attempted not only to share our judgments but also to provide the 
basic information needed to reach such judgments. 

A second shortcoming of checklists, in our opinion, is their 
almost total preoccupation with methods of work--i.e,, with questions of 
research design, measurement and analysis. The methods of work are very 
important, of course, for they can make the difference between securing 
valid or invalid knowledge claims. But as researchers become more sophisti- 
cated about these things and the number of investigators capable of pro- 
ducing reasonably "tight designs" research grows, it becomes increasingly 

Vor a bibliography of such checklists, see Bruce B. Bartos, "A Review of 
Instruments Developed to be Used in the Evaluation of the Adequacy of 
Reported Research." Blooming ton, Indiana: Phi Delta Kappa, Research 
Service Center, Occasional Paper #2, 1969. 



ERIC 



important to ask, as well, a different set of questions about the research- 
questions such as its import for ed\^cation and its implications for policy 
or practice. In the appraisals in this book, we have attempted not to 
slight these other dimensions of the appraisal process. 



/ 



CHAPTER 2 \ 

Nina Research Articles and Critiques; Description and Use 

Implicit in tha development of this book Is our assumption that 
repeated practice Is required to learn to appraise educational research 
critically. Cbnsequ6nt!y» we have selected several research articles for 
appraisal. Before beginning this analysis task, the fof lowing comments 
about the articles and the use of these materials are Important. 

DESCRIPTION 

Characteristics used In selecting the nine articles reproduced In 
this book v;ore problem nrea, methods of work, value, and difficulty. 

Problem Area. A wide variety of educational topics are repre* 
sented by the nrticics. Provided in Table 1 is a brief description of the 
primary problem aree associated with each article* Several of the articles 
have ^bftr^^ts whicfi indicate more precisely the content of the research 
report. 

Methods . An attempt war made to select articles uti 11 zing a 
diversity of approaches. In Table 1, brief labels are given for research 
types, but these tend to mask the differences In methods employed by the 
several Investigators. 

Value . Ai : the articles have redeeming features. It Is true 
that we: found much to criticize about all the articie£^ but any article 

can be critici^^ed aegetively. In our opinion, the articles are of reason- 

f 

able quality from vhich there Is much to be learned. 



-13 



Table 1 

SUMilARY OF THE RESEARCH ARTICLES 



Chapter 


Senior 
Author 


Problem Area 


Type of 

Research ^ 


3 


Josephson 


Grading and student attitudes 


* Status- ^ 


k 


Kaplan 


Eyaluation of a Head Start program 


Status 

« 


5 


Hackman 


Prediction of "long^^tenn^' success 


Prediction 


6 - 


Hunklns 


Effect of questioning procedures 
on student achievement 


Experimental 


7 


Durkin 


Development of o concept of Justice 


Status 


8 


Harris 


Reinforcement and behavioral modi*- 
f Ication 


Case study/ 
experimental 


9 


Etkind 


Factors affecting the validity of 
creativity assessments 


Experimental 


10 


Bronars-N. 


V 

The case against experimenting with 
live animals in elementary school 


Phi losophical 
analysis 


II 


Bridges/ 


Small group composition and produc* 
tivity 


Experimental 



Pifflcui ty . Results of field testing Indicate^that all articles are 
understandable to college students « V/e avoided articles having sophisticated 
statistical analyses or dealing with topics requiring prior s^xpertise in a ' 
specific content area to be understood* Several of the articles are accom* 
; panled by special notes in whldh an occasional technical term or Isolated 
material is defined or explained. Although there are some minor variations 
among them, all articles are nnoderately easy to understand. 

The level of sophistication of the critiques , however, are not equal. 
In tt)i$ respect^ the articles are arranged In a crude ordering from simple to 
hard. 

I 

The articles may be read In any order because each lliustrates 
different concepts of research and these are not arranged sequentially. 

ERLC 



J 

V. 

Nevertheless, it is probably wise to begin with one of the studies listed 
toward the top of Table I and work toward those having a more Intensive and 
sophisticated appraisal. Regarilless of the article being analyzed, keep In 
fflind the following points. 

I I. Any piece of research can be criticized negatively. The perfect 
study does not exist. Any Investigator Is operating within a system of con* 
straln/s and must make compromises. The fact that weaknesses (es well as 
strengths) are evident In every study should not be Interpreted that they 
are Wl-i^hout value. Quite the contrary. We consider each of the Invesi;!- 
gat ions In this book worthy of study. 

/ 2. Not all articles that one reads deserve the time needed to 

pjferform a thorough analysis as provided with the studies reproduced In this 
book. The professional must place priorities on how he spends his time. 
J There will be occasions, however, when specific stupes have particular 
Importance to a researcher or educator, and for these occasions It Is most 
desirable that he can appraise the work critical !y. Althetsgh the articles 
In this collection wilt not be .particularly important to many readers. It 
is well, nevertheless, that they practice critically appraising the articles 
so that this skill can be learned and then applied to works considered by 
readers to be more Important. 



3. The reader must be careful not to Infer (Improperly) that 
because the problem area of a particular article Is "1 rrelevant*' to his 
specl^alty that the task of appraising the article Is therefore Irrelevant 
or valueless. The primary purposi& of this book Is to provide the reader 
( with a set of general izable skills. The specific articles are merely 
vehicles through which basic concepts can be taught and habits of work- 
manship practiced. Much Is to be learned about the appraisal process 
regdrdless of the particular examples used for Illustration. 



ERIC 



The distinction between the research investigator and his 

work should be Icept In mind. The reader should avoid talcfng sides for 

or against tl^ Investigator; avoid trying to be easy or hard on,hlm. 

Rather t thp ta^k is to Identify the strengths and weaknesses of the 

work Itself an(ii\what these assessments mean for the educational valtie 
\ J ■ . 

of the study and for the Interpretations or knowledge claims resulting 

from the Investigation. 



5* Frequently not appreciated by readers participating in 
field tryouts of the materials Is that the learner's expectation sh6uld 
not be to duplicate the model critique. Host readers are simply not 
able to appraise a study to the extent found in the model critiques. 
The model critiques are more con^Iete and detailed than can be reasonably 
expected from even experienced researchers. The purpose of the model 
critique Is not to serve as the standard which students are expected to 
meet. Rather, they are complete and sometimes overblown stateiments 
designed In part to teach concepts and pr/ncfples. 

6. It Is our intention that the materials be used either for 
group or individual Instruction. In an effort to make the materials 
self i^tructlonal 9 we have made heavy use of ''student responses." 
Frequently tiiese are representative replies of student readers parti* 
cipating fn the field tryouts of the materials. These student responses 
ere likely to be similar to conments that you , the present reader^ may 
have made. By providing our response to these' statements^ we hof^ to 
Increase the Interactiveness of the materials and their viability for 
self-* Instructional use. 

7. The reader Is expected to read carefully each article and 
then to appraise t(ie work by responding to one or more questions. Many 
students who participated In the field testings performed poorly on the 



appraisals because they failed either to>ead the article/ the questions 9 
^'the appraisals carelFully. We've heard much about programs designed to 
Increase reading spmd. In our opinion, people need to be Instructed how 
to^ read more thoughtful lyv The f I rsV^prlncIple In research criticism Is 
actively consider what one reads* The world needs more plodders! 



-16- 



8* One can simply read through these materials like a textbook 
and passively consider the appraisal tasks and model answers. Alternatively, 
the learner can write a response to each task, thus helping to Insure his 
active involvement* Me much prefer the latter. Appraising the work of 
others Is a ''doing'' task just as performing research Is. Neither performing 
nor criticizing research Is easy; attention to detail is required, the 
work Is demanding, the rewards are high* 



'ix II 



Qiarles h. Josephson 
Do Grades Stumlate Students to Failure? 
Chicago Schools Journal , Dec. 1961, pp. 122 



Do Grades Stinulate Students to Failure? 



Charles H. Josephson 
Chicago Schools Journal » fPtec, 1961, pp. 122«*127 



!• Question: 

Before you begin, an inportant point needs to be nade. In study after study 
that %fe review, all too often the problen %ihich occasioned the research, and 
%«hich is used to introduce the research report, turns out not to be the pnhlem 
actually dealt with by the^ study as conducted. We are resOnded of a Peanuts 
cartoon in which Linus stands at Violet's front door and asks, "Hi, Violet. 
Can you come out and play?" Violet respondf, "You^re youz^rar than I e»." 
A pussled Linus turns to the reader and queries, ''Did that answer ny question?" 

The mi8aAatcl> between problem statement and answers collected by the Invee^^ 
tigator are seldom as gross as that confronted by Linus. i\ good critic must 
be alert for such incongruity. He may asks Do the data provide evidence about ^ 
the stated problem? Given the data actually collected what quefstion could be 
cosqposed to which the data would be an answer? Has the phenomena of interest 
shifted as the study progressed? What conclusions and interpretatims are 
the investigator entitled to draw from the findings? 

Five possible problem statements about which data ooultf have been collected 
are listed in this first question. We are asking you to practice an isqportant 
skill - namely relating data to the question posed. There is a sense in idiich 
one does not really know what the problea is until the eoluUMi emerges* Another 
way of stating this point is to say that any question will remain ambiguous 
until data %#hich count at the answer to the question ere specif led ♦ 

Consider the following statements: 

A. "Grades stimulate students to failure." 

B. ^'students in slum schools find it more rewarding to be considered academic 

failures than successes." 

C. Students "most likely to succeed* feel the strongest pressure to fail. 

D. **...in lower'^class schools students of low ability will desire high gradee, and 
students of high ability will desire lev grades." 

E. "There is a discrepancy between aspiration and achievement." 

Which one of the above options most accurately reflects the problem statement that 
the data of this paper deal with? Why? Give reasons for relecting eech of the 
other items. 

Hotmt We are HOT asking you %Aich statement is true. We are asking you to indicate 
which statement represents a hypothesis the investigator attempted to test ec^irically, 
i.e. the Bypo thesis about which jbhe investigator collected data. 



ERIC 



1. Aximmti C 

h. AaM«r h im quoted from thm titl«. 
TitlM of airticltit are alaott alimys both 
ai lq aln attn g and mlalaading. Bxcopt in tha 
aoat todmical of journals, titles ars 
fhrasoa in ordinary languaga and/ it is dif^ 
f icttlt to achiava preciaa »eanl^ with tha 
loosanaas and aadDiguity of ordinary languaga* 
Hota asibigttitiaa in a kay wor^ of this tltla, 
"failura*** failura'' can hava ttiraa 

aaaninga s (a) to fail out/of school (as a 
drop-out, perhaps) i (b) t^ gat a failing 
grada in a single course (as to fail al* 
gehra)i (c) to fail to achieve at a level 
conensurate with ability (underachieving) • 

For the three reasons which follow, 
option A was not considered the best state* 
nent of^the question to which the data re* 
ported in the study are relevant. First, 
the word ^stiaulate*' suggests a causal 
connection and no such relationship between 
gradaa and failure was established* Further, 
the actual grades students receive are not 
given, and thus we have no data about failurt?i 
in the sense of a teacher giving a pupil 
a failing grade. Finally, the data which 
are gathered pertain to a slum school and 
the students in several tracks, and these 
facts are not mentioned in option A. 

B. Option B is Davis* position but this 
imrestigator doen not actually collect 
data on what is rewarding to students • 
Nevertheless sons support for this posi* 
tion would be a finding that of 106 stu* 
dents interviewed in the slum school, 
a large number aspire to (i.e., «#ould 
"select") grade 5, the failing qrade. 
Bowever, not one gave that response and 
the author paid no attention to this fact. 
CM might go one st^ further to ask if 
the data presented by this investigator 



1. For a further discussion hear Robert 
M.W. ftavers, rtm Limitations of Variables 
Derived from Comson Language. Washington, 
D.C. American Sdocational Research As* 
snc^tion. Cassette Tape Series X(», 1971. 



Joftephson **3 



!• cont^d. 

actually could ba interprated as falai- 
fying Davit* position as stated in option 
B. Tha anavar is yas, if ona can establish 
that what is rewarding to students and 
what they would select** are identical. 



1. cont'd. 



Cm Although the investigator states 
pption C as a beginnix^ hypothesis (para* 
graph #4), he gathers no data on peer 
pressure; he thus cannot compare pressure 
to fail with a Bieasxire of likelihood 
spt success. 

D. We think this choice is the soost 
acd^ate one. See page 2, bottom para* 
graph where the hypothesis is explicitly 
stated. Itote also that the table giving 
the data closely follows the hypothesis. 
Recall that in queation 1 we asked if the 
hypothesis was tested in this study and 
not %diether it could be considered true 
on other grounds. 



B. Although we have data on an expecta- * 
tion of achievsnent, we h4ve no data on 
achieveoHe^t itself av^d tharefore cannot 
coi^wre achiaveiDent to asp^iratlon. The 
three tracks are said to represent 
ability levels. If they are also viewed 
as defining an achievement variable, then 
sone gross data on the discrepancy between 
aspiration and achievement are provided 
and option E could be considered an 
acceptable Tbut probably not the best) 



answer. 



2. Question: 

Having thought about the real purpose 
of this study, cite one very important 
reason why research on the broad ques* 
tion addressed in this paper is of value. 



2. Questions 

Having thought about the real pur* 
pose of this study, cite one very isgportan] 
reason %#hy research on the broad question j 
addressed in this paper is of value. 



ERLC 



Josephson *4 

2. Anmer x 2. Answers 



The rewaard system of a slun school is 
being studied. Ttiere are several acc^table 
reasons yoii might have given to explain why 
research on this topic is of value. One 
that appeals to us is that IF the research 
•hould point up the fact that the grading 
system isn*t working as intended, that ''the 
teacher*s reward has become the student's 
punishment", or that the extrinsic rewards 
(for exaa^le, the grades) of the system wield 
such s po%ferful influence that the intrinsic 
rewards of learning are dijninished or by- 
passed, THEN such distortion would provide 
support for changing present educational 
policies and practices. Die primary aim of 
an educational system should be its true ed*- 
ucational goals and not the external trappings 
attached to these goals. Florence Nightingale 
once said of hospitalft that, at least they 
should not spread disease; school systems 
shoxild not discourage true learning. 



3. Questiof^ : 

Refer to the data presented on the bot** 
torn of page 2 and to the investigator's 
descriptive labels for the grade categories 
on the bottom of page 3. Which one(s) 
cf the following statements is (are) factually 
correct interpretations of the findings for 
students in the accelerated class? 

A. Only 1/3 prefer superior grades; 
nearly the same number prefer average or 
below average grades. 

B. About 2/3 prefer grades above average; 
only 2 students preferred below average grades 

How do statements A and B differ in the 
impression they give? 



3. Question: » 

Refer to the data presented on ^le 
bottom of page 2 and tc the investigator *i 
descriptive labels for the grade cete* 
gories on the bottom of page 3. Iftiich 
one (8) of the following statements 
is (are) factually correct interpret* 
tions of the findings for students in 
the accelerated class? 

A. Only 1/3 prefer superior grades; 
nearly the-same number prefer average 
or below average grades. 

B. About 2/3 prefer grades above 
average; only 2 students preferred 
below average grades. 

How do statements A and B differ in 
the impression they give? 



ERIC 



Josephfion 



3. Ansiier: 



3. Answer 



Both MtmtmamntB arc technically cor* 
rect given the investigator's interpre- 
tation that X oeans superior and 3 Mans 
average. They differ in the ii^preasion 
they giva tha rmmAmx. ihe A statenent 
soggesta a failure of the school to keep 
high the aspirations of good students. The 
B statenent suggests sKist students in accel- 
erated classes want good grades. The A 
stateswnt is the iiay this investigator 
interprets the findings (last sentence, 
p.3). We think it acc^table for a re- 
searcher to try to find vhat his reasoning 
leads him to expect. He should not, 
however, stop at this point but should 
exaaine alternative explanations. must 
reaeodber that one can say a cup is half 
full or half anpty and be correct in both 
instances. A researcher should be a^le 
to, axkd further has an obligatim tb, say 
both, realising the different possible 
impressions he nay give his r^admru froc 
these different viewpoints. 



4. 'Question: 

Mote on the top of page 3 that fron 
each of the 3 programs (remedial, regular, 
accelerated) one class was selected in some 

unspecified fat hion. A^^^'^^i^^^y' 
investigator could have selected the re- 
quired number of students randomly from all 
the students enrolled in each of the pro- 
grams. We believe this latter selection 
plan to be far superior? Why? 



4, . Answer : 

The investigator wishes to compare 
the grade desires of students of different 
ability levels. Because he selected only 
one class from each program* he cannot 
distinguish differences due to program/ 
ability level from those due to classroom 



4. Questiont ^ 

mote on the top of page 3, that 
from each of the 3 programs (remedial, 
regular, accelerated) one class was 
selected in some unspecified fashion. 
Alternatively, the investigator could 
have selected the required number of 
students randomly from all the students 
enrolled in each of the programs. We 
believe this latter selection plan to 
be far superior # Why? 

4. Answers 



Josephson -6 



4. Answer cont'd. 

influences. Ke know fron other research 
that OA aasy variables classroostf differ 
markedly f rosi one another even %fhen the 
classrooais are coeiposed of students of 
the sane general ability. "Hie particular 
teacher t classroon peer relations, and 
other factors can lead to a distinctive 
kind of response fron students in a par- 
ticular classroom. The responses of 
pupils from one of the classes might not 
be typical of those from other classes in 
the same txkck. Thus the differences 
the investiJgator notes in the data shotm 
on the bottom of page 2, may not be due 
to program/ ability level group differences 
at all batj to other attributes of the 
three pvticular classrooms he selected 
for t>v« study. Had a random sampling 
procedure been used# students from several 
classrooms within each program would 
have been selected and this source of 
confusion in data interpretation would 
have been avoided. 



4. Answer cont'd. 



\ 



5* Question: S.^^estion : ^ 

Becall that when the students were Recall that when the students wsre, 

divided into the three programs (accelerated, divided into the three programs (accele^ : 

regular, remedial) and their desired grades rated, regular, remedial) and their de^ : : 

noted (see data on the bottom of page 2), airad grades noted (see data on the bot^ ! 

the investigator concludes that the ex- torn of page 2), the investigator conclndea 

pected, "...inverse relationship between that the expected, .inverse relation* 

ability and grades desired does not ob- ^hip between abUity and grades does mot ; 

tain." (p.3) However, when the investi- obtain.-^ {p.3) However, when the imras^ 

gator reclassifies the regular and ac- tigator reclassifies the regular and ac«»> ' 
celerated students into a single category, ^ celerated students Into a single cate«- 

-...a significantly different picture gory, -...a significantly different 

emerges.-* (p.3) Is it wrong for an in- picture emerges.** (p.3) Zs it wrong 

vestigator to manipulate his data in this for an investigator to manipulate his 

way in search of confirming evidence? OMtm in this t«y in search of confirming 

Why? evidence? Why? ^ 



S« Answer: 



He dcNi*t think so, provided the cau- 
tions aentioned in the next paragraph 
are noted • Such * teasing'* o£ the data 
in which after-the-fact- hypotheses are . 
tested can provide insights into the sub- 
ject oi the research. Such unplanned 
analyses, hovevsi:, are generally aore 
valuable as possible leads for future 
research then as fim conclusions. 

lie suggest these cautions. First 
the data should be presented in the sAnner 
the investigator had expected to present 
it before the data were collectedf (the 
present Investigator does this) or else 
the departure explained. Sscond, the in- 
vestigator should state or isfply (as the 
present investigator does) t^iat the par- 
ticular analysis presented %«as. suggested 
to hia only after the data «#ere observed. 
Third, the investi^^ator should also re- 
port plausible after-the-fact analyses 
which do support his expected conclusions. 
In this regard, it is of interest to note 
that in this study the largest group dif- 
ferences occur %Aien the extrene groups, 
the rsBiedial and accelerated classes, 
are cospared to the regular classes, nhis 
finding, if r^licated by others, would 
suggest a ouch different interpretation 
f roB that provided by the investigator. 
Finally, relationships found as a result 
of such after-the-fact nanipulating 
oust not be taken too seriously, especially 
thoset (a) not predicted ^ahead of tlaei 
(b) not amenable to a reasonable interpre- 
tatimi and (c) anerging froa a large 
nunber of coiq;>ari8ons. \lhen enough things 
are examined, some ccoqparisons will seem 
''significant*' by chance alone. 



6. QuMtion: 

Both in th* caM ¥hrat the data for 
tha thraa programs (ability laval groupings) 
ara kapt rsparata# and in the caaa iihan 
tha data for tha ragular and accalaratad 
daaaaa ara conbinad, tha diffarancas 
aaong programs in the par cant of students 
desiring the various grades ara not 
statistically significant according to 
our calculations. Khat is the inportance 
of this Statement? 



Questions 




Both in tlia casa idien the data tot 
ttiaXthraa progi ams (ability level group- 
ongsX are kapt lsepsrat«i# and in tha 

the data for the regular and 
accele^tad clataes aM ccsfcinady the 
differences asoiig program in the per 
cent of students\ deiiring the various 
grades ara not atatistically iignif leant i 
What is th« isfportanca of Vtim statement? 



6. Ans%mr: 



6. Answers 



Lack of statistical significance 
means that the differences among the per- 
centages in the three columns in the 
table on page 2 could be due, not to 
differences between program/ability 
level groups in the grades desired, but 
sisqply to errors in sampling. Failure 
to get statistical significance can be 
interpreted as a vote of no confidence 
that the differences which %#ere found 
will be Observed With another saaqE>le 
of students. The investigator should 
have realised that the program/ability 
level group differences should not have 
been taken seriously and ref ralMd from 
such strong definitive ianguage^V* 
"It seems uncontestable that a •distinc- 
tively low-aspiration group (i.e. the 
middle group) er.erge8 from these findings.' 
(p.5). 



7. Question: 

Although there are serious flaws 
in this study, there are also some< 
ccaMndable aspects. List four such 
positive features (not ccmclusions) 
of this paper. 



7. Questions 

Although there are sariou^i 
in this study, there are also 
mendabla 'aspects. Liat four 
positive features (not coneiu^iocis) 
this paper* 



Ansiiers 



thm following list is Mant to be 
soggMtive and not necMsarily conpleto. 

a) ttia in^aatigator aeaa raaaarch as having 
a claar baaring on adMcational policy and 
practicef and auggaata changaa in these 
practicaa based on such relevant research. 

b) He uaes his reasoning powers in the 
search for an explanation (but not a 
generalisation) of phenomena he thought 
be observed in the schools. 

c) Even though a teacher in the schools 
at the tiae of the study, he does a 
study - collects the data in s ity - %fhich 
Makes good use of an educationally relevant 
context. We think aore studies should be 
done by people makm decisions about 
practice as a oonsequence of the studies 
undertaken. 

d) The investigator cites a pussling ob- 
servation in the literature (Alison 
Davis's position) %ihich is an inqpetus to 
research. 

a) the investigator realizes sooie of the 
inadequacies of his study and that mQre 
complete and better planned ones need to 
be aade. 

f ) He publishes locally where the ia^pACt 
of such a controversial study will most 
likely have an effect. \ 



g) The investigator attempted to\ obtain 
valid measures of aspirations. He thought 
of devices (e.g. anonymous responses and 
additional questions) as an "honesty 
check** to the first question. We do not ^ 
claim he was successful but do cosnnend 
the atteoqpt. ^ 



h) He manipulated his data in more than 
one way. (See question and answer IS) * 



ERIC 



\ 



Josephaon -10 



7« Answer cont'd. 7. Answer cont'd. 



i) Die research was open-ended in the sense 
that it suggested further investigation. 

j) The papcur %fas highly readable and %nritten 
In an interesting fashion. 



Concluding remark; In their classical paper, Caa^bell and Stanley wrote: 

At present, there seem to be two main types of **experijnenta- 
tion"* going on within schools: 1) research ^'iaqposed*' upon the 
school by an outsider, who has his own ax to grind and whose 
goal is not ianediate action (change) by the school; and 2) 
the so*-called **action** researcher, w}io tries to get teachers 
themselves to be ^'experimenters*', using that %«ord quite loosely. 
The first researcher gets results that may be rigorous but not 
applicable. The latter gets results that may be highly applicabl 
but probably not ''true" because of extreme lack of rigor in the 
research, (p. 21) 

The present paper clearly falls into the second category* 



1. Campbell, Donald T. and Julian C. Stanley, Experimental and ^^si- 
experimental/ Designs for Research, Rand HcNally, Chicago, 1966. 



Appendix 111 

Eleanor ICaplan 

'llead Start** Experience and the Development of 
Skills and Abilities in Kindergarten Children 

Graduate Researdi in Education and Related Disciplines 
Vol. II, No. 1, April 1966 



""Head Start" Bxperlaxice and the Developoent of 
Skills and Abilities in Kindergarten Children 

^ Eleanor Kaplan 

Gradaate Research in Education and Related Disciplines 

Vol. II, No. 1, April 1966 



SPECIAL NOTES 



The present article would be classified as an example of educatlonaJTevalulft^^ 
tion. There is disagreenient among experts regarding the distinction between 
evaluation and research . Some say that the purpose of evaluation is to derive 
assessments of the worth of particular instances of educational undertakings 
such as individual textbooks and specific programs i the purpose of research 
is to produce generalizeid>le conclusions. We see the distinction to be one. of degree 
rather than kind. In both studies we ask whether the activities followed permitted 
the investigator to accomplish the objectives of the study. 

For each set of data the investigator conducted a chi square test of the 
statistical significance of the difference between the score distribution found 
for the two groups of children. The investigator is seeking to determine 
whether the difference in the proportion of students in the tMi& groups %^ ^re 
above a particular score could happen by chance alon^. Specif icallyf the sia«* ;; 
tistical test indicates the probability of getting such a large difference 
in proportions If only chance (i.e. sampling variabilityX were operating. When 
this probability is small (defijmd in this paper as less than 5%) 
the chance^alone hypotheses is not very likely, the investigator indicates that' the 
difference was statistically significant and, presumably # the Head Start 
program had an effect. 

On page 17, just before the last paragraph, the parenthetical esq^ression 
should have been written: (.05<p <.10). / The symbol,^, means "less tl^ 
Urns, the probability of differences between two groups on enui^ia tion scores 
as large ae those actually found could be expected to occur 5 to 10% of the tlM 
even if chance alone were operating (i.e., program had ho effect) • This 
%fasn*t small enough for the investigator to ;r eject with <k>nfidence the hypothesis / 
that for a pc^lation of children similar to these 70, no differences on this 
variable would be found. % 



*HMd Start* Experience and the Developnent of 
SkUls and Abilities in Kindergarten diildren 

Eleanor Kaplan 

Graduate Research in Education and Related Diecipllnee 

vol. ZZ, Nd. 1, April 1966 



QUBSTZOKSt 

1. **The purpose of this study was to evaluate whether the children who 
participated in Project Head Start %#ere better prepared for kindergarten 
than those who did not participate. • •** To acccn^lish this purpose ^ the 
investigatort 1) reviewed the literature, 2) stated hypotheses, 3) selected 
subjects, 4) selected and constructed measuring instrwents, 5) adeinistered and 
scored teste, 6) performed analyses, and 7) drew conclusions. 

A. What two importantly different kinds of information are contained 
in this review of the liters tiure? What, in general, are the main 
purposes of any review' of the literature and how well did the in- 
vestigator succeod in achieving these purposes? 

B. writ^ a critical appraisal of each of the other six aspects of the 
study identified above, being sure to cite strengths as %mll as 
weaknesses. 



2. The investigator evidently feels that the Head Start progreais involved 

in her study were very effective and worthwhile. Yet there is information needed 
in addition to that given in the report if one is to reproduce such an effective 
progrAo elsetrtiere. What information is lacking in the report which prevents it 
from serving as a guide to one who must develop and operate a HeaL Start^rogram? 
(Assume that the leader has much freedom in how he plans and runs a Head Start 
program. ) 



3^ 




*tia«d start" EMpmxlmaem mod tbm D«r«XbpMnt of 
Skills and Abilities in Kindergarten Children. 

Bleanpr Ki^ian 

Craduate Research in Edncatioii and Belated Disciplines 

Vol. XZ, Ho. 1, AprU 1966 



One kind of inforaation in the literature is the description of the social and 
political fotces %fhich in 1965 %iere changing drastically the ssrekindergarten public 
ediica^4^ econosiically and eocially disadvantaged children. The Kiiplan report 
indicates by 1965 Project Head Start ^benefitted* 560,000 youngsters in 2,5000 
ooeBunities at an estiaated cost of $112,000,000. A second kind of information 
in the literature review is more ccBsaonly found. The investigator cites eepirioal 
studies (e.g., Bernstein, 1960, 1962i Deutsch, 1956b) and studies of new educational 
practices (Grahaa and Hess, 1965| Hess and Stosen, 1965) . 

Onf Bsin purpose of a review of literature section in satirical studies is 
to describe the educational context in sufficient detail such that the justification 
of the study is clear. The literature review succaeds fairly well to give iis the 
political, historical and esqpirical context of the Jtudy. Theae political and eocial 
dhanges to educational practice which the investigator documents serve as an excellent 
stinulus and justification for educational research.* 

A second main purpose of a review is to indicate the soturce of concepts and 
principles used to guide the inquiry. One can, find instances in which the evaluation 
wee influenced by the empirical studies and %ariting quoted in the review. One 
example of the influence of these sources on the conduct of the inquiry is the 



* Among social scientists and educational researchers there often exists a 
tension between being eocially relevant (**on a white horse''} and scientifically 
rigorous ("%9earing a %fhite coat"). Kbenever eocial ^langes ta)ce place rapidly and 
pervasively, the tension can develop into a rift. In our opinion^ this division is 
unnecessary and counterproductive. Social changes can be thought of as an excellent 
stimulus to espiirical inquiry, as we indicate about the Kaplan study. More than that, 
the empirical researcher who can say as a result of inquiry that he^lcnows both the 
facts and tho educational consequences of political and policy decisions can b ec o m e 
a valuable influexure upon the shaping of future educational policies. Many re<» 
searchers would prefer to spend money on research before changes are made so that 
tbsy might be made intelligently in the light of new knowledge* Social urgencies 
dictate otherwise sometimes. Perhaps the best course ia to combine the twos research 
can change policy and practice, and changes in policy and practice can be a^ valuable 
etimolus to further research. For a discussion of some of these issues, see Mevitt 
Sanford, Sbf Am^igrAcan QfilAfi9.«# ^ohn Wiley a Sons, Mew York, 1962, pp. 1*30. 



ERLC 



9 

Kaplan -2 



literature %«hlch points to the need for emphasis on language teaching for the 
disadvantaged. This information justifies the inclusion of language develop* 
SMtnt neasures in the study* 

K third main purpose is to provide a theoretical context from vhich the 
knowledge clains of the inquiry can receive intelligible interpretation., Ibere 
is none of this material in the review. Some readers would say that the large 
differences found after a short sumer program are rather remarkable, yet there 
is no theoretical context, nor even an educational rationale, provided %«hich 
can help us to accotmt for or maka sense out of these findings. 



1. Hypotheses . The hypotheses on page 10 are a clear statement of the 
questions to which the investigator is seeking answers. Although it is not always 
necessary for questions to be in the form of hypotheses in %fhich predicted 
results are stated, we approve of the investigator's indication in this^ section 
of the direction in which she predicts the results will appear. Most experts 
favor directionally stated scientific hypotheses to those expressed in the 

less cotaninicative null form. 

In assessing the hypotheses, several student readers questioned the inves-^ 
tigator's methods of measurement, the failure to consider other variables in the 
study, and the feasibility of matching studerkts. Valid as these concerns may 
be, for convenience they will not be considered at this poipt in our assessment 
of the study. 

2. Subjects . The principal technical flaw in the evaluation is that no 
control had been exercised over the assignment of. children to Head Start 

or control programs . Furlnher, because such variables as sex, ethnic background, 
age (only a 10 month range), language spoken in the home, and age of siblings 
would not be expected to be highl y correlated with the measures used in the 
study, the reader has little assuranco that the two groups being compared 
were initially equal in those skills and abilities the Head Steurt program 
most wanted to affect. The invefstigntor also mentioned this problem, (p. 14). 

We could assess more accurately the likelihood of this initial equality 
if we were told in the report the reasons x^tiy the control children did not 
attend Head Start classes. Did thoy live too far away from the Head Start 
center, come from more stable homes, or live j.n better neighborhoods? 
Did the control children not attend Head Start programs because their parents 
chose not to send them? If so, then differences in attitudes toward education 
(as seen by differences in the learning experiences provided in the home - 
learning experiences such as talking, reading, color identification, etc.) 
could mean that the Head Start children would have' scored higher than the control 
children even before the Head Start experience was begun, and certainly after 
am additional ye£ur of a better learning situation in the home. 



ERLC 



Kaplan "^S 



Tha Imraatigator %iaa visa not to natch atudanta on intalllganca or othar 
oognitlva or attitoAa variablaa «aaaarad aftar tha Haad Start axparianca. 
If tha Baad start prograa inprovad tha childran'a acoraa on aiich variablasr 
than matching childran on thoir acoraa iioold cancel tha vary af facta to ba 
dawmatratad. 

Soppoaa tha investigator had bean able to administer identical criterion 
flMtauraa (verbal fluencyr eottnciation, etc.} before the Head Start eKperianoa 
and to Btttch children on the basis^ of their scorea on such »aaattraa* Diffesencea 
between the two groupa %#ou]d stiU be esq^ted on these Masures idian the 
childran %rere teated in kindergarten, even if the Head Start pgograis had no 
effect in developing the skills and abilities neasured by the criterion taata» 
Such bogus .or false differences can bo explained by the regression phenosmon* 
(For an elenentary discussion of the regression phenonenonr read: Kenneth 
Hppkina, "Regression anu the Matching Fallacy in Quasi-*Cxperiaiental Researth**, 
Jbumal of Special Education , 3, 1969, 329-336.) 

lie do not fault the investigator for matching students. We ttmrely wish 
to point out that such matching was probably largely ineffective in assuring 
the equality of the two groups prior to training. Matching on variables 
measured before the Head Start programs verer begian and lAiich were more highly 
related to the criterion variables %fould have been far more preferable • But 
even if thia %mre done, the lack of random assignment of children to the Head 
Start and control conditions still prevents the ruling out of selection bias 
and regression artifacts. 

Frequently expressed reactions of student readers are that 35 children 
per group is too small a nuinber and the number of Head Start programs being j 
evaluated is not mentioned in the article. More data are aluaya desirable^ 
but an investigator must %reigh the increased scope against ttit isicreased *'coats*' 
associated with having a larger sample size. The differences between the Head 
Start and control groups were sufficiently great that 35 cases per group were 
adequate to reject for most of the variables the chance alone null hypothesie. 
Perhapa more useful than a larger S£^le size per se would be having as a asmple 
children taken from several Head Start programs. We suspect^ but are not 
certain r that all the, children were exposed to the same program and# if thia 
was the cassr the generalizability of the results is very uncertain* 

3. Measuring Instnnients ( selection and construction) . Given the rather 
limited goal of assessing the comparative performances of the two groups of 
children r then ideally the measuring instruments used in the study should 
represent a diverse collection of reliable and valid devices of measuring 
the degree to which the intended skills and abilities have been developed 
and unintended ones are absent. 



ERLC 



Kaplan ^ 



Many student readers objected to the absence pf test reliability and 
validity data in the report. If a test is unreliijsle, then it is not sMisuring 
any trait or skill consistently! the test score then has a large cosqponent of randon 
error. Gach inconsistency of sieasureoent and randon error are to be avoided since 
real treatment effects will not be revealed by such unreliable instruments • In 
the context of this study. Head Start programs can not be judged effective if 
the Masuras of effectiveness ar^ largely unreliable. Since the investigator 
did find gro^ differences * ve can assume that the instruments esiployed had^ac* 
citable levels of reliability* 

''Karrowly considered, validation is the process of examining the accuracy 
of a specific prediction or inference made from a test score. ..One validates, 
not a test, but an interpretation of data arising from a specific procedure . 
Thm investigator liould probably claim that the test itemis are representative in*- 
stances of the skills being described and, thus# her inferences about children's 
capabilities based on their test performance are valid. Such a claim seems rea« 
sonable to us with possibly two exceptions. First, %«e question whether the 
Goodenough - Draw a Han TMt is as much a measure of motor coordination as it is 
an indicator of other skills. (Note, the investigatot probably meant to say 
on page 12 that the test's scales rather than norms were used.) Second, %fe have 
eoB^ qualms about the buttoning-oim«^lothes measure since the task is not the 
same for all children. (Some children had harder clothes to button than others.) 

Because the specific Head Start programs being evalua^ were not described, 
ve do not know for stirs the extent to which the abilities and skills measured 
by the tests used in this study do represent the primary objectives of these 
programs. Further, %^ do not know the extent to which the very tasks used in the 
tests %#ere used in the training programs themselves. This is not to say that 
it would be wrong to use identical tasks in both teaching and testing. It is just 
that interpretation of group differences and the value of a program depend upon 
Icnowing the relation of tasks tested to the tasks used in training.' 

We suggest that in an evaluation study of this type three categories of 
tasks be used in the testing: 1) those tasks directly involved in the training 
(on yAiich large group differences would be expected) ; 2) tasks not used in the 
training but on which it is hoped there will be group differences i and 3) tasks 
representing unintended outcomes (on which there is expected no group differences) • 



I 



* Cronbach, Lee J., Test Validation, ^Chapter 14 in R.L. Thomdike (Ed.) Educational 
Measurement, American Council on Education, Washington, 1971. 



ERIC 



Kaplan 



Ite %fould like to have seen nore of the category tw and category three 
taaks uaed in this evaluation. As exaaples of category two taaks, %ie would like 
to have seen the differences in perfomance of the two groups on tasks requiring 
left^right visual search and production of graphic synbols (e.g., letters). 
In addition, as a category two or three task, measures of personal-asocial ad- 
justaent to school would have also been of interest. 

thm investigator engaged in good practice, bowevcrr, in including several 
Mssures of perfomance rather than relying on just one or two. Vlhere there %iere 
nb standardised tests to neasure the type of perfomance on which the investigator 
wished to conpare the groups, she devised her o%in tests for these skills and 
abilities. This research practice is coinnendable. 

^ • Test Administration and Scoring . The importance of administering tests 
prior to the start of the Head Start program was mentioned earlier. 

The investigator indicates that the instruments were ^^ulminictered, "... 
at the beginning of kindergarten in order to insure that these skills and a«» 
bilities to be tested were not learned during the kindergarten experience." (p. 13) 
Although there is some merit to this procedure, we feel it would have been desirably 
if some of the tests had also been administered at the end of kindergarten, or 
even later. The critical importance of ascertaining the long^-tem benefits of 
Head Start programs has been well documented by the investigator herself. The 
advantage of the Head Start group during the first weeks of the school year may 
be due primarily to preschool environment and materials lAich have no carryover 
effect on later learning. Although detemining if there is an immediate effect 
is useful, it would be of great value to document that a primary goal of Head 
Start programs,, increased perfomance in school, was met. 

Recall that the measuring was not blinded from the standpoint of the ob- 
server, although the investigator claims on page 14 to have made no effort to 
remember %Aiich children were^in the Head Start group. This is small comfort 
to the reader who suspects that the children's meod>ership in either group could 
have been independently identified and thus could have biased the judgment of the 
investigator as she administered and scored the tests. 

The testing was somewhat subjective, both in administration (e.g., fre-* 
quency of directions to be given, probing for temination of reuponses) and 
scoring. ( See especially the cutting, coloring and enunciation tests.) Thus, 
the results were open to the Influence of the evaluator herself. Vtie investigator 
is not to be faulted for using instruments which %#ere subjective in nature. However, 
using tliese instruments in such a manner that the subjective element Invalidates 
the comparison between the two groups is a procedure open to censure. 



5. Analyl> , The analysle of the data was adequate and not Kisleadlng even 
tboulgfh fl»re precise statistical techniques could have been enployed. The 
Investigator could have utilised the eicact scores and not have forced then 
into ti«o categrories (above and below the coed>ined nedian) • rurther, the inves-^ 
tigator could have siade use of the fact that she had matched pairs of child- 
ren. Bowever, these objections carry little weight since the result of sub- 
stitoiting these worm refined measures would have been more power (i.e., like- 
UhooA of rejecting false "no difference** hypotheses) and almost all of the r 
chance-alone or no difference hypotheses were rejected even without their us^. 

The investigator is to be commended for not evidencing an unthinking 
attachment to a particular criterion of statistical significance. (See Special 
Notes on page for an explanation of the 5% criterion used by the investigator.) 
Particularly in the case of the cutting-skill variable, the evaluator ebowed 
her willingness to accept evidence of a difference even though the <^tained 
test st'^'^istic fell someifhat short of the critical value needed to claim sta- 
tistical significance at the 5% level. 

6. Investigator's Conclusions . The investigator is quite correct in stating 
that,^»« .kindergarten children who had attended the Head Start program were 
superior to those tiho had not ib^each of the skills and abilities tested.** 
(p. 22) This conclusion is mcnrely a factual statement of the results found. 
Even though a few differences did not reach statistical significance, it is a ^ 
fact that the Head Start group had superior scores on all the measures. 

The investigator is also permitted to say, **The findings support the 
current view that culturally deprived children benefit from preschool pro- 
grams.** (p. 25) **Findings support the current view**, is interpreted to mean, 
findings are consistent with the current view, and does not inply that the 
results prove that the children benefitted from the programs. 

Because of the lack of fundamental controls as specified earlier in our 
appraisal, we have no assurance that the differences were due to the Head 
Start programs. Thus, we feel the investigator is not justified in making 
conclusions that imply the Head Start programs caused the superior performance. 
We question the validity of such a conclusion as: *'The experiences provided in 
the instructional program made it possible for children in the preschool Head 
Start project to become niore adept...*" (p. 24) * .-^"^ 

Finally, before claiming that results will generalise to other Head Start 
projects, we would want to see such positive results from a larger sample of 
students and programs. 



ERLC 



Kaplan -7 



2# Hd develop and operate a Haad Start prograa affactivaly, one %fOuId 

Mad to hava mach financial, lagal and political infornation not toucbad upon 
in tha report* Tt) plan tha Inatructional aipacta of the p ro gra a, thai ia to 
dacida ^t to taach and bow and nhan to taacb it, a datailad i^acif ication 
of tha Baad Start prograaa being evaloatad in the present article ia needed 
if the experience reported in the article ia to have benefit. X«ack of this 
epecification is a ujor deficiency of this report* 

The reader is left ccnpletely in the dark as to the components of the 
prograM, their duration, the training and nober of staff, the objectives of 
the prograas, the procedures used to achieve these objectives, etc* Without 
even the aost rudlnentary description of the progrsflas, the investigator has 
^ prodocad an evaluation report not unlike a research report in %rtiieh the inde* 
pandent variable was unspecified. As the raport nov stands, its nearly total 
neglect of description of the programs oakes it of use only to a saiall nuHber 
of persms who are intiaately connected with the prograns being evaluated. 
MO two Read Start prograns are alike. Without a description of the prograns 
herein evaluated, we do not know what prograns to perpetuate or how the progri 
should be conducted differently. What good is an evaluation that aceMthing 
works %fhen that "'something'* is not defined? 



Appendix IV 

J. lUchaxxi liadoaan, Nan^ Wiggins, Alan Bass 

Prediction of Long-Tenn Success in Doctoral ttorlc in Psycholog)r 

Educational and Psydiological i ieasurcment 
1970, 30, 36S-374 



Prediction of Long-run Success in Doctoral Itork in Psychology 
J. Richard HacloDBan, Nancy Wiggins^ Alan Bass ^ 
Educational and Psychological Measurement ^ 1970^ 30, 365*374 



X. Questiont 1. Question: 

Mhat witre the investigators hoping to achieve? Ifhat were the investigators 

Ibat is, vhat was the purpose (s) of the study? hoping to achieve? lhat is, 

itfhat was the purpose (s) of the 

study? 

Answer It 

Answer: 

We think the investigators had two priinary 
purposes which are vrell stated in the opening 
and closing sentences of the initial paragraph 
of the article: a) to examine, "...the degree 
to which measures of aptitude and undergraduate 
preparation obtained before the beginning of doc- 
toral study are predictive of the (short and 
long-term) 'success* of psychology graduate stu- 
dents b) **...to determine the degree to which 
evaluations made at the end of the first year of 
doctoral work are congruent with the long-term 
assessments of success in the program. Die 
relationships mentioned in purposes a) and b) 
above are shorn in Tables 1 and 2 respectively. 

Student Responses . Several students inferred 
that the investigators were trying to make pre- 
dictions rather than just to gather information" 
about relationships between predictors and cri- 
teria, niey claim that, **...the purpose of the 
study wee to find a kind of cause/effect relation- 
ship, so that the Graduate School at the Univer- 
sity of Illinois or other graduate schools can 
make specific recommendations to undergraduate 
institutions, to future students^ and to faculty 
mtfibers about changing or maintaining certain 
practices.** ^ 

Our Reply . Worthwhile as such a purpose might be, 
the investigators did not state it as their aim. 
If their purpose were to devise a prediction 
systesi which could be used by educators, they no 
doubt would have then followed the recoismended prac- 
tice of cross-validating their results; that is^ 



Hacknan «»2 



trying out the mymtm on a atiident group dif** 
feraat fron that used to develop the prediction 
fomula** 



2. Question: 



(a) Bow many specific pre^^enrollaent predictors 
(not groups or categories) were used? Your 
answer should be a specific nunerical value. 

(b) Bow many specific criteria %rere used? 

(c) Has it a good idea to ssploy so many 
variables in a single study? Why or why not? 



2* Question t 

(a) How aany specific pre«^ 
en^llflient predictors (not 
groups or categories) were 
used? Your answer should be 
a specific numerical value ♦ 

(b) Bow many specific criteria 
were used? 



Answer 2: ^ ■ 

(a) Thirteen predictors were used. These pre* 
dictors are listed in the left-hand column of 
Tables 1 and 3 as %rell as in the body of the 
article. 

(b) Ten criteria ii^e employed; all but one 
of these are considered short-^term criteria* 
These criteria are listed at the tops of the 
columns in Table 1, in Table 2, and in the body 
of the article* 



(c) Has it a good idea to em^ 
ploy so many variables in a 
single study? Why or why not? 



2* Ans%rer: 



(c) We approve of using multiple predictors and 
criteria in any study for two reasons* First, 
we are rarely interested in a dependent variable 
iriiich can be perfectly measured by a single variable* 
A good case in point is the present sttuiy in which 

success** is clearly a cibwlex concept - the 
more aspects of success we" study the better. 
Second, the more independent, or predictor, vari-* 
ables included in a study, the more information 



* For an entertaining account of how failure to 
cro8S-*validate a prediction system can lead to 
astounding and tinfounded claims, read: £ Cure ton, 
^'Reliability, Validity and Baloney***, Educational 
and Psychological Measurement , 1950, 10, 94-96. 



Hackaan ^3 



AMiiur 2 oout*d. 

M obtain about tha ralationahipa no ara intar* 
aatad ta. study of a graatar natwork of inter* 
ralationahipa aida in oooiprahanding and explaining 
tha raaaona for tha ralationahipa • 

On tha othar hand, uaa of variables poorly 
naaaurad or lacking rationale for their inclusion 
ahould not be encouraged. It should be kept in 
sdnd that %ihen a great nany relationships are 
studied, it is probable that aone bogus, *'signi» 
ficant" ones will appear. Thus, caution is re» 
quirad in interpreting isolated findings • Further, 
for atatiatical reasons involving the stability 
of the prediction equation coefficients, there are 
too nany predictors (for so few students) to 
construct a prediction system that %#ould be 
expected to %iork well (cross-validate) on a 
different sanqple. The investigators ii#ere wise to 
focus their analyses on siiqple, two^variable 
relationships > 



Answer 2 cont'd, 



3. Question: 

The predictors used are categorized into 
four groups: 

(a) Aptitude and ability 

(b) Foreign language facility 

(c) Undergraduate grades 

(d) Rated quality of undergraduate school. 

Evaluate the a^ropriateness of the specific 
Bwasurea eniployed in each group of predictors. 
(Zn your answer, focus upon irfiether these measures 
wera reasonable choices and not upon whether, 
in fact, they seemed to %iork in thia particular 
study.) 



3. Questions 

/ 

The predictors used ara cate«» , 
gorised into four groupai 

» . . • » ■■■ 

(a) Aptitude and ability 

(b) Foreign language facility 

(c) Undergraduate grades r~ '^'A 

(d) Fated quality of under* 
graduate achool j 

Evaltttte the appropriateness v 
of the specific measures em-^ / 
ployed in aach gro^p of pre-* jl: 
dictora. (In your anawar# fod 
upon whether these measures f 
were reasonable choices and ; . 
not upon whether # in fact, 
they seemed to vMk in this . 
particular study.) 



ERLC 



Hackman ^•4 



3. Ai^swers 3. Answer: 

a) Aptitude and Ability Predictors , At least 

for staort-tem successr aptitude xoeasures have been 
shown td be good predictors. The Graduate Record 
E xa mina tion (GBE) tests are widely ei&ployed and 
hi^ve poroved useful in the past*. The GRB correla^ 
tions serve as a useful benchmark agalxxst which 
to judge the magnitude of relationships found with 
other predictors. Both past performance amS current 
practice argue for inclusion of these test scores. 

b) Foreign Language Facility . Unfortunately the 
reader has to wait until the very end of the paper 
before he is given the rationale for including these 
predictors. The argument is not terribly convin- 
cing. We have no objection to the inclusion of 
foreign language facility but suspect more inter- 
esting and meaningful predictors could have been 
fouiid. The three specific measures employed in this 

/ category leave much to be desired as true indicators 
of foreign language facility. They may have been 
used because they were handy. Their inclusion 
is no crime; it is just that they are not apt to 
be very enlightening. 

Student Responses . In the evaluation of several 
of these predictors, as well as in the evaluation 
of some of the criteria (see Question 4), a large 
number of students %rere critical of the subjectivity 
involved in the measures > Many students %«ent so 
far as to say that some measures %«ere "worthless" 
or should not be used" because they were subjec- 
tive. 

Our Reply . "Subjectivity" can have two meanings. 
In one sense, subjectivity means based on personal 
experience or a matter of opinion. In another 
sense, it means unreliable and that jxidges do not 
agree. A doctor should not dismiss a patient's 
coiq>lai&it of pain because it is based on personal 
experience or because other jisdges cannot agree 
on the amount of pain involved. Likewise, we 
would caution researchers against an off-hand 
dismissal of all subjective measurements. Ihe 
^enomena we may have the greatest difficulty 
measuring will sometiioes be those most wrth 
measuring. One must often ask whether it is better 
to measure something trivial well or to measure 
something important poorly. 



ERIC 



Backaum 



3« AnsMT cont*d« 



y c) Ond^rqradttmts Acadanlc Parfonaance > It is 

M^um po^^i^ predictorii for the same reasons 

> ttiat the aptitude measures should be included* 
Of courser grades from different institutions are 
not ooapletely ccsqparable siMe a C at one institu-* 
tion stty sbo^ greater achievenent than a B at another 
institution* Nevertheless, even with s\ich a de« 
ficiency, grades have been found to be useful pre- 
dictors in the past and should be included. 

» • 

Analyzing the grade record by sjpecific course 
^has the advantage of naking the grades somewhat 
ccoqparable, although this comparability is achieved 
at the ^ loss of reliability. Grade averages based 
on one or several courses simply are not as re« 
liable as more coopposite measures for the same 
reason that tests with one or only a few items are 
not as reliable as total scores computed on many- 
item tests. He further wonder why (but are not 
critical that) grades earned during the first two 
years of undergraduate study and the number of 
semester hours of psychology were not included as 
predictors. 

Student Response . ""Doesn't the regression 
phenomena enter in here? One presumes a 2.75 or 
2.8 cut-off so you'd be looking mainly at very 
high grades to start with." 

Our Reply. The subjects are a select , extreme 
group whose performance on other measures is ex- 
pected to regress toward more average levels. 
Because this group is not being crapared wit±i other 
groups, however r this regression effect is hot a 
source of bias. Ihe student does suggest a reason 
why undergraduate grades (and other measures used 
in student selection) might not be as highly related 
to the criteria as one would hope. Presumably 
the 42 subjects in the study all. had quite good 
undergraduate grades (or else they would likely nbt 
have been admitted to graduate school) . A pre- , 
die tor which does not discriminate among the stu- 
dents (that is, the students' performances are 
relatively homogeneous) is not likely to correlate 
highly with a criterion. Had all the stxadents 



Hackman *6 



3. AnswMT cont*d. 3. Ansv^r cont'd, 

yitio applied to the doctorate program beer^ admitted^ 
r^gardleaa of their undergraduate grades or aptitude 
tmmt (£eores# the correlations involving these pre-* 
dictors %«Ottld undoubtedly have been greater.* 

d) puality of Undergraduate Institution , Because 
grades at different institutions are not conparable» 
M see the inclusion of this variable as a vise 
daclsion both in studying its relationship with the 
criteria directly, and as a variable to use in 
adjusting undergraduate grade averages. More infor- 
nation about the nunber, nature, procedures and 
criteria used by the coosaittee to arrive at the 
quality ratings would be helpful to the reader 
i4io fldght wish to use the same variable in a local 
prediction study. Failure to specify fully how this 
variable was measured makes it of limited use to 
others. 



Fiurther, we %ionder if the judges* ratings 
of particular institutions could have been biased 
by knowledge of which students came from which 
institutions. The judge, foz\ exan^le, might think 
more highly of institution X because student Ar 
%^ is giv€ui high ratings on the "success*" criteria, 
casMi from that institution. Conversely, perhaps 
students coming from institutions thought highly 
of %#ere expected to do well and a self-^fulfilling 
prophecy %ia8 in operation, ihis latter possible 
explanation for the positive correlation between the 
quality of undergraduate institution and most measures 
of success** was noted by the investigator. 
hm one student put it: **It looks like a case of cir- 
cular reasoning. What schools are rated excellent? 
Those %«hose students do well at this university. 
And what students are successful at this university? 
Those who come from schools that are rated excellent! 
Shall we go another round?"* 



* To see why this is the case, consider three 
persons whose IQs are 110, 111 and 112 (very homo- 
geneous scores) • Ihere is no predicting who would 
do best in college. Zf their IQs were 50, 100 and 
150 (mentally retarded, average and gifted) , making 
correct predictions wuld be easy. 



ERIC 



Hac)anan 7 



3. Answer cont'd. 

Student Responses , "There should be categories 
between 1 and 2 and between 2 at^ 3 which many 
schools vould fit into core fairly and accurately." 
*tl>e 3-point scale range is too small as to make the 
differences practically useless.** 

Our Reply . We disagree. Even 2*point scales 
(such as above average, below averan^) have been 
found to be very useful in predicting criteria. 
Although %rc certainly have no objection to a finer 
scale, the experience has been that added scale 
values result in rather meager gains in predic* 
tability. 



4. Question: 

The criteria of short term "success" are divided 
into three groups: 

a) Grades earned in first year of graduate school 

b) Self-report measures 

c) Faculty ratings 

Evalxxate the appropriateness of the specific measures 
used in each of these groups of criteria. (In 
your answer, focus upon whether these measures were 
reasonable choices and not whether, in fact, they 
related to the long term "success" criterion.) 



4. Question: 

The criteria of short term "suc- 
cess" are divided into three groups i 



4. Answer: 

a) Grades earned in first year graduate school * 
Grades have typically been used as measures of 
success. It is a good idea to inclxide them 
both because they are considered important 
and because they can be used to study their 
relationship to long-* term success. VJe believe 
the investigators were wise to separate out 
the grades earned in core courses, for at least 
these grades would be comparable 2unong the 
students. One student's grade in physiological 
psychology and another's grade in abnormal 



a) Grades earned in first year 
of graduate school. 

b) Self-report measures 

c) Faculty ratings 

Evaluate the appropriateness of 
the specific meas\ires used in 
each of these groups of criteria. 
(In your answer, focus upon 
whether these measures were rea- 
sonable choices and not whether, 
in fact, they related to the long 
term ^success" criterion.) 

4. Answer: 



ERLC 



Hackman **8 



4. Answer cont*c. 4^ Answer cont;d. 

psychology, for example, might not he comparable* 
Ttib fact that these core courses %^e Included 
In the end*of-year average means that this 
composite measure will have a built-in dependency 
(correlation) with the other criteria in this 
category* 

b) Self'^report measures . These two measures 
seem to be reasonable indicators of speed and 
persistence toward achieving the Ph.D. 

Student Responses * **Why was the student 
rating expressed as slow or fast progress to 
a Ph.D. rather than feeling with satisfaction 
with progress whatever the speed?*" **A student 
can be deeply^ engrossed in his study for learning's 
sake and be highly successful and motivated and 
yet be totally unconcerned with his speed toward 
his Ph.D. It is unfortunate that large univer- 
sities often place the degree above the actual 
learning taking place.** 



Our R6ply . Of course, other student self- 
report sieasuxes could have been used. We suspect 
that administrators and professors associated 
with the degree program mre more concerned 
about the students' perceptions of their actual 
progress than these students* feelings of satis- 
faction about their progress. Since the investi- 
gators did incliKie grades earned in first year 
graduate school among the criteria, "actual 
learning** was not ignored in this study. 

Student Response . *'Self-report measures 
should not be obtained after the grades were 
issued, but before.** 

Our Reply . He found this to be an interesting 
reaction to which we could both agree and dis-* 
agree. By requiring the student to report his 
progress before he receives the formal grades, 
we can obtain a measure of how he truly thought he 
was progressing and such an evaluation might be 
less contaminated by faculty opinion. On the 
other hand, by permitting the student access to 
the formal grades as information to use in making 
a considered judgment of his progress, a more 
realistic estimate of his true progress might 
result. 



./ 

/ 



Kackman 



4. Asxs%rer cont'd* 4. Answer cont'd. 

c) Facalty ratings . Such categories as, "ex- 
cellent progress, assured of financial aid**, and, 
""dropped froB graduate program*, simply do not 
seen to be points on a coeinon scale. These ratings 
appear to encooopass a %fhole gasnit of possibilities, 
including good performance, persistence and volun- 
tary wlthdrawl, and %ire would like to have seen 
all of the scale values for these ratings. Further, 
the shorthand labels of these variables in Tables 
1 and 2 do not seem especially appropriate. 



Student Response . Many students felt that, 
**there could be bias in a faculty member's opinion*", 
and that faculty ratings, i*** . .are an unfair 
criterion.*" Further, many students wanted, 
"* . . . to know how divergent the various faculty 
members were in the rating of the same student." 

Our Reply . We agree that facility ratings 
might have bias and not be a fair rating of a 
student's real progress. In addition, as one student 
pointed out, **... faculty ratings can be influenced 
by predictor ratings.*" (We discussed this 
a bit toward the end of our answer to Question 3d.) 
Since each student's faculty ratings were averages 
of several ratings, the Influence of a single 
professor's bias or susceptability to contamina- 
tion by a predictor was lessened. Although we 
recognise faculty ratings will have at Least 
some shortcomings, we nevertheless support the 
use of faculty ratings as a criterion of short* 
term success. The fact that the hiring of recent 
Ph.Ds depends heavily on the recoomendations of 
the students" professors serves to remind us 
that colleges and txniversities consider such 
faculty ratings to be a suitable criterion. 



5. Question: 



5. Question: 



A key concept in this study is **long-tenr. success.*^ 
Give two or more reasons for rejecting the de- 
finition given this term in the paragraph starting 
at the bottCBi of page 367. 
How might this definition be defended? 



A key concept in this study is 
"'long-term success.** 

a) Give tvK> or more reasons for 
rejecting the definition given this 
term in the paragraph starting at 
the bottom of page 367. 



b) HOW might this definition be 
defended? 



ERLC 



Hackman -10 



5. Answer: 5, Answers 

a) Before listing many objections to the definition 
of long- term success, it is necessary to point 
out a confusion on the part of several students. ^ 
The professors of the students did NOT aake the 
long-tem success ratings. Rather, these professors 
were asked only to define %^t they thDtight 
""constitutes 'success* for a psychology doctoral 
student." Based on these answers, the investi- 
gators, in scBie unexplained v^ay, constructed 
the 9«point long-term success scale. Two judges 
(probably two ot/xhe investigators or their as- 
sistants) then made the rating using the infor- 
nation ( available at the tine the student left 
the university) of v^re the student was going 
and the ""circumstances "" of his leaving. Thus, 
for practically all students, the loi>^-term 
success index was determined from data available 
%fell before six years had elapsed. 

Probably the most serious criticism of the 
long-term success measure is that it fails to 
consider many factors coononly thought of as indi- 
cators of success. Not inclxuled, presumably, 
(prestamably because we do not know the intermediate 
scale values) , are such indicators as quality of 
teaching, Mryice to the profession, grants 
awarded, n^^Ab^l^ and quality of publications, 
etc. ""Acceptance to a highly prestigious institu- 
tion,"" is not usually thought of as the only or 
even the most valid indicator of long-term success. 
The inccc^leteness and irrelevancy of the measure 
of long-term success is clearly the most serious 
flaw in the study 



Typical st^aent cooments lAich we included 
under this first objection are the following: 
""There seems to be little concern for the perfor- 
mance on the job in this reseeurch."* "'Those who 
drop out are automatically excluded from being 
judged successful."" "It %#ould be possible for a 
student to withdraw from the program and later 
continue the study of psychology and be successful."* 



ERIC 



\ 

] 

1 



Kackioan •ll 



5, A&svmr cont'd. 5. Answer cont'd. 

Second, %re believe that the exiccess measure 
iriould be strengthened if it tooK into account 
at least the program from i^ich each person 
has graduated. For example* a long-term 
success meastire for a graduate of the clinical 
program might be number of patients or fee chargcnS 
per client. 

Third, we agree with one student who points 
out that the investigators * long'-term measure 
is, "...not measuring long term success; it is 
merely rating a student as to the circumstances 
uxKler %#hich he left the University. Ob measure 
long-term success his career has to be followed 
up after he left." Other students viorded this 
objection as follovrs: "Success cannot be measured 
immediately after graduation. Determination of 
long-term success must be made after a period 
of time has elapsed." "Notation was made of 
where a student intended to go but no follow-up 
on the students was made." "The * long- term 
success is not long enough. A person may have 
accepted a prestigious position, but may not have 
been able to retain it." "The use of the word 
long-term is unfortunate. The long term aspect 
of the question %#ould deal with careers." 

Fourth, it should be noted that the 9-point 
scale is not fully identified. We can only 
speculate as to what description (if any) is 
given to the intermediate points. As we indicated 
earlier in our discussion of ratings of short- 
term success (see oxxr answer to Question 4 c)/ 
the points on the long-term success scale do 
not seem to be tapping a coomon dimension. 
It is difficult to )cnow where to place a person 
\Ayo, for example, drops out of a program but 
yet demonstrates "success" in other ways. 

Fifth, as one student pointed out, "The 
definition may be rejected on the basis of the 
narrow sampling of experts used in determining 
%#hat is and what is not success. They are 
professors at the same institution in the same 
department who are probably prone to similar 
tho\2ghts on an issue such as this." The restricted 



ERIC 



HacloDan *12 



5. Answer cont'd. 5. Answer cont'd. 

nature of the ean^la (number not given) of 
faculty vhose opinions %#ere used in developing 
the long-term success scale increases the liX^ 
lihood that the criterion will not seem appro* 
priate to other faculty groups. 

Finally, note a built in dependency between 
short and long- tern criteria. If a person 
drops out of a graduate school he laust necessarily 
receive a low rating on both the short-term 
and long-term assessment. Thus, the relation- 
ship between long-term and short-term criteria 
ie alnost predetermined even though the study 
of this relationship is presented as a primaury 
objective of this research. 

Student Responses . n«ith only two jtidges, 
what does a reliability of .95 mean?** have 
difficulty with the 'inter* judge reliability' 
which %#as .95 %«hen there %^re only two judges." 

Our Reply . The records of the 42 students 
%rere rated on the 9-point scale twice, once by 
each judge. The correlation coefficient ocoputed 
on these 42 pairs of ratings was .95. These 
two judges agreed almost perfectly on the rela- 
tive ratings assigned to the students. A 
reasonable inference is that the high agreement 
resulted from a clear definition of the scale 
points. 

b) The senior author of the article, in personal 
ooiaminication defended the definition of long- 
term success by witing: believe it is im- 
portant to be able to predict things like 
%ihich graduate students are most likely to 
flunk out 2rs. withdraw vs. get a Ph.D. and take 
a job at Podunk University v£. get a Ph.D. 
and accept a job at a prestigious university 
such as Cornell. Certainly the faculty of 
graduate schools feel that such 'long-term' 
criteria are important.** 



ERLC 



Hackaan -13 



6# CoMtioni _ 

6. Question: 

An lnvMtlgators paxmittad to daflna key terms . . ^ ^ ^ 

(such as -loag-tersi jsuccess'') any way they ^ 

wish? Explain irtiy you answered as you did* ^!f^ !! 

tem success") any nay they wish? 

Explain %fhy you answered as you did 

6« Answers 



<Xir first three objections to the investiga-* 6. Answer: 

tors* definition of ""long-term success" (see 

our answer to QuestdLon 5 a) suggest what we 

think the texip to xnean. In cosnenting on our 

critique, the senior author wrote us: **Ix>ng-* 

ten5# which to un meant * after the end of a 

student's graduate education' apparently 

iaplied scene kind of carefr^long perspective 

to Messrs. MiUinan and Gowlh. They can mean 

whatever they want to« but in discussing our 

study I'd suggest they talk about our operational 

measure (which indeed has seme problems) rather 

than focus entirely on the name we put on our 

measure." 



He %iould agree that investigators should 
be permitted to define their terms as they like. 
On the other hand# they do have an obligation 
to foster accurate coeninication about their 
%Pork and this goal is not sufficiently achieved 
when labels are used which convey meanings 
markedly different from those intended. In 
such situations# it is often possible to be 
misled into thinking that accounts of a study 
are more generalisable and significant than 
they actually are because persuasive labels (such as 
long* term success) are given specific meanings. 



7. Question: 



7. Question: 



A student reviewer of this study stated that 
this investigation merely demonstrates what 
was already coasK>n Icnowledge. Do you agree? 
support your answer;^ 



A student reviewer of this 
study stated that this investi- 
gation merely demonstrates 
%ihat MS already ccomon know- 
ledge. Do you agree? Support 
your answer* 



Hackman *14 



?• Answers 



1. Answers 



We disagree. We were surprised, for exanple, 
that the investigators found negative correla«* 
tions between undergraduate grades and their 
"global assessment of success" rating. In 
spite of our laisgivings about this "long-tena 
success" index vre would have anticipated at 
least small positive correlations. Further, 
%ie %iould not have expected rated quality of in- 
stitution to be such a good predictor of long- 
term success. Many readers %iould not have 
predicted these findings and they could not 
be considered coiomon knowledge. 



Student Responses . "I thought that 
facility in a foreign language would be a great 
asset toifards long term success in doctoral 
studies." "I was surprised that GRE«(^uanti** 
tative and GPA mathematics correlated .00." "I 
didn*t expect there to be so many negative 
correlations." 



Our B^ply . There is probably ^rery little 
tlven is conaon knowledge. Like beauty, eur* 
prise is in the eyes of the beholder. 



8. Question 

Question 8 a) through 8 d) are based upon 
the ^following sentence quoted from the first 
paragraph on page 371: 

For example, the quality of the under- 
graduate school, which predicted first 
year grades negligibly, was found to be 
significantly related to student and 
faculty global assessments of progress 
toward the Ph.D. at the end of the year, 
and correlated more substantially with 
the long<-term criterion of success. 

Look again at the teiJ^les in the report of 
the study. 



8. Questions 

Question 8 a) through 6 d) are based 
upon the following sentence quoted , 
from the first paragraph on page 
371s 

For example, the quality of 
the undergraduate school, %fhich 
predicted first year grades 
negligibly, was found to be sig* 
nif icantly related to student and 
faculty global assessments of 
progress toward the Ph.D. at the 
end of the year, and correlated 
more substantially than any other 
predictor with the long-term 
criterion of success. 

Look again at the tables in the 
report of the study. 



ERLC 



Hackman •IS 



8. Question cont'd. 



8* Qiaestion cont'd. 



a) Find those nuinbers which indicate the de«» 
gree of relationship between quality of the 
undergraduate school and the other variables 
mentioned in the quotation. What numerical 
values of correlations did the investigators 
find to lead theo to nake their statement 
%ihich is quoted previously? 

a) Answers 



a) Find those nusibers which indicate 
the degree of relationship between 
quality of the undergraduate school 
and the other variables mentioned 
in the quotation. What numerical 
values of correlations did the 
investigators find to lead them to 
make their statement which is 
quoted previously ? 



From the last line of Table 1# the "negligible 
correlations between rated quality of undergra^ 
duate school and graduate school grades are: 
•OQg -.13, .21, .16, and .15. Ihe significant 
relations to students and faculty ratings of 
progress toward the degree are .30 and .31, 
also found in the last line of Table 1. The same 
line also shows a .43 correlation with the 
long-term measure of success. 

8b) Question: 

Can a negligible correlation be statistically 
significant from zero? 

8 b) Answer: 

Yes. Some people i^ould say that for prediction 
purposes the significant correlations referred 
to in the quotation of .30 and .31 are negli~ 
gible. The "negligible" correlation of .21 
would be statistically significant if 88* 
instead of 42 students %^e involved in the 
study. Correlations of .30 and .31 would not 
be significant at the 4 ]/2% • level of signi* 
ficance. We wish to make two points. 1) In 
this context, negligible is an adjective des-* 
cribing the magnitude of a correlationi sig- 
nificance describes a different attribute « the 
likelihood of correlations of a given value 
occurring in a random sample of a population in 
which the actual correlation is zero. 



8. Answer: 



8 b) Question: 

Can a negligible correlation be 
statistically significant from 
zero? 

8 b) Answer: 



* Arrived at by a t test of the significance 
of a correlation from zero. 



ERLC 



Macknan *16 



e b) Ansvwr cont'd. 8 b) Answer cont'd. 

All four ooiDbinations of these two descriptive ^ 

adjectives are possible: ntgligible, significant; 

negligible, not significant, not negligible, 

significant! not negligible, not significant. 

2) One nust be careful not to have an unthinking 

attachment to correlations (or other statistical 

indices) which are barely statistically signif i« 

cant at soeie level of confidence, and die tain 

for correlations which do not quite make the 

cut*off between significant and not signif i<- ' 

cant. (Hote our answer to Question 8 d) . 



8 c) Questions 

Fbr these 42 students, which pre-enrollnent 
predictor was able to predict the long-tens 
global assesasient criterion most accurately? 
On what evidence do you baee your answer? 

8 c) Answer: 

Undergraduate CPA in the physical sciences 
is the best predictor. Ihe correlations 
between the pre-enrollment predictors and the 
long-term success criterion are given in the 
last column of Table 1. GFA in the physical 
sciences has the highest correlation (-.50) 
and, thus, i#ould predict the criterion best. 
Because the correlation is negative, students 
having a low GPA in physical science would 
be predicted to have the highest long-term 
success rating and vice versa. (The predictor 
having the highest positive relation with 
the long-term success criterion is, as the 
investigators state in the sentence quoted, 
quality of the undergraduate school.) 



8 d) Question: 

If the appropriate statistical test were 
run, guess whether the quality of the under- 
graduate school would correlate with the 
long-term success criterion significantly more 
(in the statistical sense) than, say, the 
Quantitative score on the GRE7 



8 c) Question: 

For these 42 students, which pre«» 
enrollment predictor was able to 
predict the long-term global assess- 
ment criterion most accurately? On 
what evidence do you base yous 
answer? 

8 c) Answers 



8. d) Question: 

If the appropriate statistical test 
were run, guess whether the qvMlity 
of the undergraduate school wpuld 
correlate with the long-term success 
criterion significantly more (in the 
statistical sense) than, say, ^e 
Quantitative score on the GRE? ' 



ERLC 



Hacknan -^l? 



8 d) Anmert 

This jdif f erence is not statistically signi- 
ficant, zt is true that both correlations 
are significantly different frop zero at the 
5% level • Vhis fact is indicated by the 
asterisks affixed to tHe correlations of 
• 32 and .43 in the last column of Table 1. 
The point being illustrated is that corre* 
lations vhich are significantly different 
from sero need not be, and indeed frequently 
are not, significantly different from each 
other. A clear cut example might be tm 
correlations of .80 and .79 being each sig- 
nificantly different from zero but having a 
difference (.01) that is not significant. 
The investigators did not test the difference 
betwen correlations in any of the tables and 
statements comparing the relative sizes of 
than should be made cautiously. 

9. Question: 

One of the findings of the study, as pointed 
out in Question B earlier, is that rated 
quality of undergraduate institution has 
a fairly high correlation with long-term 
success. Does this mean that if you were 
an admission officer in the Psychology De- 
partment at the University of Illinois and 
primarily interested in this measurement of 
success you should give preference to students 
coming from highly rated undergraduate insti- 
tutions? Why or why not? 



9. Ansi^r: 

If, and this is a big if, you were interested 
in predicting this long-term success measure, 
then yes, you should give preference to 
students <:oming from undergraduate institutions 
rated highly by the same (vaguely described) 
procedures used in this study. (Quality of 
undergraduate school should not be the only 
factor considered, of course.) It is true 



9. Question 

One of the findings of the study, 
as pointed out in Question 8 ear-^ 
lier, is that rated quality of 
undergradxiate institution has a 
fairly high correlation with long^ 
term success. Does this mean that 
if you were an admission officer 
in the Psychology Department at the 
University of Illinois and pri- 
marily interested in this measure* 
ment of success ^ you should give 
preference to students coming 
from highly ratad undergraduate 
institutions? Why or why not? 



9. Answer: 



ERLC 



Hackman - 18 

9. Answer cont'd* 9. Answer cont'd 

that this high correlation tnay not show up in 
another sasnple, but chances are better that 
the variable will be positively related than 
that the relatioxmhip will disappear. The exis- 
tence of a high correlation between quality of 
undergraduate institution and long-term success 
does not mean that the quality of xindergraduate 
institution caused the students to have long-^ 
tern success - indeed, the same forces which are 
responsible for a student's selecting (or being 
selected by) a highly rated institution might / 
be operating at the time he selects (or is being / 
selected by) his employer upon his graduation. / 

Student Responses . **X would not give / 
any preference to those students coming from / 
a highly rated institution. What a person putr 
into an institution is what he will get out / 
of it.** ** Absolutely not. To me the entire / 
record should be evaluated and eqtial weight 
given to all variables, insuring fairness and 

a chance for the student to achieve this goal ' 

if he really has the desire to try.** "^As an 

admission officer in the iPsychology Department 

at the University of Illinois, I wouldn't 

show a preference to students coming from 

highly rated undergraduate institutions. I 

would, however, carefully consider all information 

in the folders of all applicantis." "The GREs 

jwould be important to me as a comparison of 

the individual students so this would definitely 

affect my decision." "I wuld not give any 

preference to students coming from highly 

rated institutions. The rating scale was too 

narrow and biased." 

Our Reply ^ The above comments of student 
readers appeu to be a denial of the facts in 
the case; namely that the quality of under- 
gradtaate institution was predictive of the 
long-term success measure. The rating scale 
may indeed be narrow and biased, but it worked. 
Most of the other information in the folders, 
particularly undergraduate grades and some GRE 
scores, have either low or negative correlations 
with the success measure or unknown predictive 
validity for this criterion. That is, there 
is not sufficient evidence to believe that these 



Hackman -*19 



9. Answer cont'd. ' 

othar variables will be effective in predicting 
the student who will be rated high on the long- 
term success measure. One can find good reasons 
for objecting to the appropriateness of this 
criterion, but that is not the issue under 
consideration. (Reread Question 9.) 

/■' 

Student Itesponses . "It is more likely 
that strong ^individuals are selected by high 
quality institutions • Oherefore, the judgmftnt 
should be ^nade on the basis of the individual " 
and not the institution." "The long-term 
* success/ of students coming frcnn highly rated 
undergraduate institutions may be due in part 
to the self-fulfilling prophecy. Students 
coming from a highly rated undergraduate insti^- ^ 
tution may get hired by prestigious universities 
because such universities may expect him to do 
well merely on the basis of t^is undergraduate 
school." 

Our Reply . The student who made the first 
response overlooks the fact that it was the insti 
tution amd not the "individual" predictors 
that worked. Both students, and many other 
readers whose responses we did not quote, quite 
properly attempted to explain the reason for the 
success of the undergraduate institution quality 
rating as a predictor. 

Student Response . "Students must be selected 
for more justifiable reasons than what the 
names of their undergraduate, schools were. 
A better measure must be fpund." 

Our Reply. In this student reader comment 
the frustration of many of us is given expression 
One student discussed the dilemma in these words: 

Being intelligent, perceptive, and 
liberal, I, of course, would not discrimi-* 
nate against a sttident from a low-*rated 
school. However, if I had to bet on which 
student would succeed, I would go with a 
student from a high*rated school. These 
statistics indicate I would have a better 
chance of winning. ' 



Hackman • 20 



9. Answer cont'd, 9. Answer cont'd. 



An exaople of anotlier area of concern may 
help to clarify the situation. Suppose you ovm a 
coB^ny that produces screws and nuts (metal variety) : 
The more screws and nuts turned out by an employee^ 
the more money you, as a company owner, will earn. 
Your personnel office reports a company study in 
i4iich, let us assume, race correlates .43 with 
production - white employees producing more per 
aian hour than black employees* liow, you know that 
skin color per se is not the cause of this dif f er«* 
ential production, and that there is some more basic 
reason. But, while you ponder the underlying 
causes, a vacancy occurs and two applicants, one 
black and the other white, apply for the job. The 
applicants are equal on all other factors you usually 
ccssider. Whom do you chose? If money is the 
only criterion you would *'bet" of the %^ite man. 
If, as coii^>any owner, you are willing to consider 
more unselfish motives relating to, say, society's 
needs, you might give the black a^licant a chance. 

None of the predictors in the present graduate 
school study tells the whole story and they may well 
discriminate against the poor, the late bloc^ner, 
and the special student. An admissions officer 
may try to be fair to such people by deviating 
from total reliance on the best predictors available 
to him by choosing individuals with a lower prob- 
ability of success. When he does so, it is because 
he feels that criteria other than his success measures 
are import«int. Of all institutions, educational 
ones are perhaps best able to afford using multiple 
indicators of success. * 

It %^uld certainly be nice if each person 
could, "^...be given the chance to fail or to succeed 
on his own without a survey telling him he can or 
cannot do it.** When demand greatly exceeds supply 
(e.g., only some of the applicants to graduate school 
can be admitted) , not everyone can be given that 
chance, and choices have to be made. 



ERLC 



Frajicis P. iainkins 



yc Iiiflueiice of Analysis and Lvaluatxon Questions 
on /vc*iicve tent in Sixtji Grade Social Studies 

Educational Leaders! dp i^esearci Supplement 
January 1%G, p. 326-332 



The Influence of Analysis and Evaluation Questions 
on Achievement in Sixth Grade Social Studies 



Frencis P. Hunkins 
Educational Leadership Research Supplement 
January 1968, p. 326<*332. 



SPECIAL NOTES 

Pao^ 32 6* The Taxonowy of Educational Objectives is a book by Benjamin 
Bloom and others in which are described types of cognitive abilities organised 
into the following categories: knowledge* cocnprehension* application, analysis, 
•ynthMls and evaluation^ Each of th^^se categories is further subdivided into 
More specific skills and abilities. The authors of this volume hypothesize 
^that these six major categories are hierarchically arranged with knowledge at 

bottom of the scale and evaluation at the top, and witli each Mt^p of the hypothesise 
mcale depandont upon mastery of previous categories. Thus, for example, they/ that an 
Jjidividual cannot properly evaluate (category 6) a statement about, say, atoms 
without first learning certain facts about atoms (II), comprehending certain 
Ideas about them (12) # being able to analyze these facts and ideas (14), and 
mo on» 

Page 330 , Table 1 . Recall from the design that there were two treatments 
(Condition A in which questions reG[uirlng analysis and evaluation were stressed, 
and Condition B in which questions requiring only knowledge were in the majority), 
four reading levels, and the two sexes* Each student was considered to fall into 
one of these 16 possible categories (i.e. 2x4x2" 16). One category, for 
mxample, would be Condition A, reading level 2, girl. 

Table 1 shows the results of a statistical analysis designed to test if 
^here were significant differences in achievement scores amoni students in cer-» 
tain combinations of the categories. In each of the first se\en rows of the 
tahle are reported the results of an analysis involving a different such com* 
parison. The Source of Variation column identifies the comparisons involved. 
By TreatSM^nt is meant the conv^^rison between Condition A and Condition 0, 
or more precisely, between the scores* of students in the eight categories in-- 
volving ConditlonA with th« scores of students in the eight categories involving 
Condition r i^lmilarly, the reading level comparison involves a test of the 



o 

ERIC 



irjNi^nts • SPECIAL notes •a 



•IfDlClcance of th« diffetenceff iLAon? th« scores of students in the four reading 

Itrvel coAditions* II boys score significantly higher than girls (or vico v#^rse^, 
k It %rill be reflected in the results of the sex coovp&rison shown in row thret^ 
^ ay statistically significant is oeant thot the 

differences are sufficiently large tnat it Is unlikely that they could cccar 

by randoai sas(>ling, or by chance atone. 

.% 

The next four rows involve interaction coe^par i^ons . since these inter- 
action offsets were neither significant nor of much concern in tJ^is study r they 
will not be discussed foxiher here. The concftjjt of interaction is discussed 
in regard tn other articles in this scries, 

Tho last effect represents differences in srorcs /uncng.the students within 
each of the 16 category cd. These diffcroncey ar€^ not tested fov^ significance; 
rsther they serve as a base from which to evaluate tliC other differences as- 
sociated with the fxrst seven effects. 

The Invostigatjor prr formed two kinds ot analyses - one involving .the actual 
achievement socres and tht other involving scoiero that were "adjust^" for liVis^ 
tesc ttcore differences. In both case^r the d,f. cclufnn represents degreet^ of 
freedom which relate t.o (but do i;ot exactly equal) the niiirtber of group:^ being 
coffipared. The S.S. rol^iinns and the M.S, column stand for the suai of squares and 
mean i:#*^^niat« rif&|>cctivtfly , and ar** intersiodictt.e calcalations in the analysis of 
irarianc^ • 

The numbers in the column are used to indicate if results are statis- 
tically significant. Fcv* a given number of degrees of freedom (d.f.), the lilgher 
1^ th^j P riu9\ber the rK?re on) Ikftly th^t chance alone could account for the differences* 
and t>»e inor^s statisticel3y significant the results. You'll note that the dif- 
ference in test Ff.ores (when adjusted for preachleven^ent score differences) of 
students in Con'3ition» A and and of students in the four reading conditions, 
Koxe sbatist>cally sigixf leant. 

Pa ^c 330, coIUiTTn I , l^» "y_ l^,.s nd 15 , The symbol, y , means "greater than" 
and the Q representb c^uArter. 0^7 9i ^^'^^ that the mean achievement 

scores of 8tuJent>» in the fourth (i.e. top) quarter in reading was significantly 
grraater than the «>ean score of studentf^ in the third quarter in reading^ and so 
on. Just above the Results section tward the bottoia of page 329, the investi- 
gator incorrectly uses th€^ word, "quartile**, to mean quarter. The first 
quartile is 31.5, the point below which 2b\ of the scores lie. The first 
quarter of scores coveis tho rarge 0 through 31. 



ERIC 



**The Influence of Analysis and Evaluation Questions 
on Achievexnent in Sixth Grade Social Studies." 

Francis P. Hunkins 
Educational Leadership Research Supplement 
January 1968, pp. 326-332 



1. Do you think the title a good one? Why? 1. Do you think the title a good 

one? Why? 

A good title for a research report will 
describe the contents of that report as accur- Answer: 
ately and as completely and consiscly as pos- 
sible. This particular study is an investi- 
gation of the effects of several kinds of 
questions asked upon subsequent achievement. 
The written materials used and questions asked 
dealt with social studies, the grade level was 
sixth; relationships with reading level and 
sex were investigated as w^ll as teacher dif- 
ferences ax\d pretei^t scores for students. Not 
all of these elements can be easily mentioned 
in a title and therefore the investigator must 
choose those he considers most essential for 
inclusion. 

Not clear from the title was that the prin- 
cipal independent vaoriable was the kind of 
question asked and answer provided. A better 
(but not the best) tittle would have been, "A 
Comparison between Knowledge and Higher-level 
Questions aind Answers on Achievement in Sixth 
Grade Social Studies." But other than this 
concern that a description of the manipulated 
variable be given priority in the title, we 
think the title a fairly good one. 

Some students objected to the use of the - 
word "influence" in the title and felt that 
the study was inconclusive and influence was 
not demonstrated. We do not share this con- 
cern because a title need not convey the spe- 
cific finding, only the intended problem. 
Thus, a study with a title that begins, "The 
Relationship Between," might have as its 
finding that there is no relationship between 
the variables investigated. Although titles 
such as '*A Study of the Influence of..." and 
"An Investigation of the Relationship Between 
..." would be less ambiguous , it is accepted 
practice to use the abbreviated version. 



HUNKIHS *2 



2. Reread the Introduction. Does this re- 
search provide a test of the heirarchical 
hypothesis implicit in the Bloom et al. 
taxonomy? Give reasons for your ansyrer. 
(See Special Notes for a discussion of this 
hypothesis. ) 

The stxsdy docs not prow the hierarchical 
hypothesis is true; nor does the investigator 
clain that it does. Since only three levels 
are involved, this research can not provide 
a complete test of the hierarchical nature of 
all 6 levels. Further, just because the re- 
search is '' concerned" with Bloom's taxonomy 
does not meaoi it provides a test of it. 
Whether the study even gives some support 
for it is a question about which esqrcrts dis- 
agree. Some say NO and argue that the hiv 
archical hypothesis is assumed to be correct 
and merely used as a starting point about 
which the research is organized. Othej^E say 
YES and argue that the results are consistent 
with the hierarchical hypothesis and thud 
give some si^>port for its validity. 

A lesson to be learned from this specifxy 
question and answer is that a criterion for a 
hypothesis to be tested is the presence of dat:a 
which can count as evidence in support of or \ 
against the hypothesis « Those who answered \ 
YES should be able to point out such evidence. ^ 
The personal views ejq>ressed by many students 
that the hypothesis is most reasonable and 
that the distinctions among the cognitive 
levels are very important do not, in themselves, 
justify the conclusion that the hypothesis 
was being tested by the research. 



2. Reread the Introduction. Does 
this research provide a test of 
the hierarchical hypothesis inpli 
cit in the Bloom et al. taxonomy? 
Give reasons for your answer. 
(See Special Notes for a discus^ 
sion of this hypothesis.} 

Answer: 



3. Reread the Objectives section. Do you 
think the overall hypothesis is a clear 
and accurate statement of the hypothesis 
the investigator wishes to test? 

The statement is probed^ly quite accurate, 
although awkwardly i^rased. We would have 
preferred deletion of the ending: "... in 
relationship to...'* A separate sentence could 
have been added to describe these secondary 
concerns . 

ERIC 



3. 



Reread the objectives section, 
you think the overall hypo- 
esis is a clear and accurate 
st:atement of the hypothesis the 
investigator wishes to test? 



\Re; 



Ansv^r : 



\ 



\ 



HUNKINS - 



3 



3. cont'd 

We oiight note that the standard procedure 
is to express the hypotheses in the direction 
the investigator really expects them to be 
true, rather than in the '"null*" form used. 
The use of the null form is not incorrect, 
but it does represent upsophisticated report* 
ing. It is the substantative question (hypo- 
thesis) the reader wants to know about. 



A. Reread th\e section. General Plan of the 4. 
Study. 

a) Did the investigator construct the 
materials £jx>ut Africa and Oceania? 
Give reasons for your answer. 

• 

b) Were the questions used in the instruc- 
tion of the multiple-choice type? Give 
reasons for your answer.. 

c) The investigator attempted to reduce 
the influence of the teacher on the experi- 
mental situation by avoiding active teacher 
participation. Was this wise? Give 
reasons for your euiswer. 

a) It is true that, "...two sets of text- 
type materials...," were constructed by the 
investigator. However, these sets stressed 

questions and must have been widely supple- Answer: 
mented by other materials, primarily the text- 
book. Note that: "Pupils in both treatment 
conditions were directed to read designated 
sections of their textbooks..." We believe 
that the special instructional materials con- 
structed by the investigator consisted only 
of questions and answers. 

b) No, at least not all the questions 
used in instruction were of this type. Note 
that the pupils had, "...to respond in writing 
to the questions on their worksheets." This 
suggests that students had to construct 
their responses rather thetn simply select 
their responses as it is the case with mul- 
tiple-choice questions. (Do not confuse the 
criterion test of achievement (which did con- 
sist of multiple-choice questions] , with the 
questions asked as part of the instruction.) 

ERIC 



Reread the section. General Plaoi 
of the Study. 

a) Did the investigator con- 
struct the materials about 
Africa and Oceania? Give rea- 
sons for your answer. 

b) Were the questions used in 
the instruction of the mul- 
tiple-choice type? Give rea- 
sons for your answer. 

c) The investigator attempted 
to reduce the influence of the 
teacher on the experimental situ- 
ation by avoiding active teacher 
participation. Was this wise? 
Give reasons for your answer. 



KUNKIN5 -4 



4. cont*d 

c) Yes eayd no. By reducing teacher parti- 
cipation, the investigator can be more certain 
that the differences in scores of students 
using the two sets of materials are actually 
due to the experimental variable, type of ques- 
tion asked. We say that the study is more 
likely to have internal validity. However, 
the price paid for this internal validity is 
a lessening of the external validity because 
the study results may be reliably applied to 
limited classroom practices. By minimizing 
the role of the teacher we ceuinot determine 
vrhat the effects might be if teachers asked 
the different types of questions rather than 
presenting them in written form alone. The 
investigator has gained control at the expense 
of conducting the investigation under fairly 
narrow aind less typical conditions. Many 
research experts argue, as this investigator 
evidently does, that it is more importeuit to 
guareuitee that the comparisons made are valid 
even though this validity necessitates con** 
fining the research to a study of less typi- 
cal practices. But compromises must be made 
and we certednly do not fault the investigator 
for restricting the role of the teacher. 



Reread the section, Siobjects. 



5. Reread the section, subjects. 



a) Note that the proportion of boys to 
girls (67:60) in Condition A does not 
equal the proportion (55:78) in Condition 
B. Does this fact mean that the compar- 
ison on the criterion achievement test 
between students in Conditions A and B is 
misleading? Why: 

b) "Background data were collected and 
analyzed for both pupils and teachers." 
'Nliat data were collected and was it impor- 
tant that the investigator analyze them? 

c) Primarily for stxadents having had a 
course in statistics: On page 328, column 
1, the investigator indicates that his 
criterion for determining whether a back- 
ground variable should be used as, **,.,a 
possible* covariamt on subsequent analyses 



a) Note that the proportion of 
boys to girls (67:60) im Condi** 
tion A does not equal the pro- 
portion (55:78) in Condition B. 
Does this fact mean that the 
coc^arison on the criterion 
achievement test between stu- 
dents in Conditions A and B is 
misleading? Why? 

b) "Background data were collec- 
ted and analyzed for both pupils 
and teachers." What data were 
collected and was it important 
that the investigator analyze 
them? 



ERLC 



KUNKIKS -5 



of the criterion data.../' is whether 
pupils ill the two conditions differ sig- 
nificantly (in a statistical sense) on 
that V2u:lable. Is this a good criterion 
to use? Why? 

a) Unless controlled for in the analysis, 
a coD^^arison between all the students in the 
two conditions would be misleading if boys 
and girls do not perform equally on the de- 
pexident variable. Since performance on the 
criterion test is related to reading ability, 
axkl since the girls in this study were reported 
to be better readers than the boys, it is 

not unxeasonable , therefore, to expect that 
on this basis boys and girls will score dif- 
ferently on the criterion test. Thus, con- 
dition B with a higher proportion of girls, 
could have an unfair advantage. As it turned 
out^, such bias is a little concern since Con- 
dition A was still judged to be superior to 
Condition B in spite of the possil'le advan- 
^ tage given to the latter treatment. 

One way to control for these differences 
is to report criterion test scores separately 
for boys and girls^ Another way is to weigh 
equally each of the 16 subcategories. (See 
Special Notes for a description of these cat- 
egories. } The investigator did not indicate 
the procedure he followed to handle the dis- 
proportionate frequency-in-categories problem. 
If an acceptable procedure for controlling the 
disproportionate number of boys and girls in 
the two conditions were used, then the com- 
parison between conditions is not misleading. 

b) Pupil I. Q. and reading test scores 
were mentioned as well as (in the Results sec- 
tion) some kind of pre-test. Information about 
age, teaching experience and college degree 
was obtained from the eleven teachers. Other 
information concerning both pupils and teachers 
may have been collected but was not reported. 

Yes, it was important that such data were 
collected and analyzed, especially in the case 
. ct the puipils. Since only eleven classes were 
r involved in the study, and thus gross inequal-^ 
ities between groups possible, it is important 



c) Primeurily for students having 
had a course in statistics: On 
page 328, column 1, the inves* 
tigator indicates that his cri- 
terion for determining whether a 
background variable should be 
used as, ** • . .a possible covari- 
ant on subsequent analyses of 
the criterion data...,** is wheth- 
er pupils in the two conditions 
differ significantly (in a sta- 
tistical sense) on that variable. 
Is this a good criterion to use? 
Why? 



Answer : 



ERIC 



1' 



HUNKINS -6 



5. cont'd 

to know how these background varieO^les differ 
for conditions A and B. Student data are 
also of value for the pupose of understand- 
ing the limits of penxiissible generalizations* 
Since the teacher influence was minimal, 
teacher differences are not so importemt as 
pupil differences. 

c) This is a teclinical question, the an- 
swer to v^ich you are not necessarily expected 
to know. Our answer is no. The criterion 
which was quoted assumes, incorrectly, that 
failure to reject the null hypothesis is 
equivalent to establishing its truth. Merely 
because the differences in reading scores 
and I. Q. scores, for students in Condition A 
and Condition B are not statistically signi- 
ficant does not mean the two groups are iden- 
tical in these regards. There is sufficient 
difference between the groups which could go 
a long way toward explaining the difference 
on the criterion variadble. Further, the inves- 
tigator fails to recognize euiother important 
reason for including a covariate — namely, to 
increase the precision (power) of the statis- 
tical test. Although beyond the score of 
these notes, suffice it to say that even if 
the two groups were equal on these background 
varieU^les, it would still be a good idea to 
en^loy them as covariates in order to increase 
the likelihood of a tnie difference on the 
criterion variedDle being detected. 



The construction of the criterion test of 
achievement is described in the section. 
Collection of Data. Reread this section. 

a) Do you agree that, "...only the total 
achievement score was of concern in this 
phase of the investigation."? Give 
reasons for your answer. 

b) From a pool of 59 items, 42 items 
were selected and 17 were eliminated. 
On what basis was the decision made to 
accept an item? On what basis were the 
17 items eliminated? 



The construction of the criterion 
test of achievement is described 
in the section. Collection of 
Data. Reread this section. 

a) Do you agree that, "...only 
the total achievement score was 
of concern in this phase of the 
investigation."? Give reasons 
for your answer. 

b) From a pool of 59 items, 42 
items were selected and 17 were 
eliminated. On what basis was 



ERLC 



HUNKINS -7 



6« cont'd 

c) Do you feel that each item measured 
the level of cognitive ability that was 
intended? Why? 

d) How in5)ortant is it that this classi- 
fication task be done accurately? Explain. 

e) Publishers of tests used to making 
decisions about individuals often con* 
sider reliability indices of «90 or more 
as high (i.e., good) and indices of less 
than .70 as poor. The investigator seems 
to be unhappy with a reliability ind^sx 
of .68. Should he be? Why? 

a) Absolutely x>ot! Altho\jgh the investi- 
gator may expect achievement to be better on 
all questions for pupils in Condition 
surely he must expect the most dramatic dif- 
ferences to occur on questions tapping high- 
er level skills. When possible, as in this 
case, it is important to provide data which 
relate to predictions growing out of one's 
conceptualization of what is going on. Fail- 
lire to provide mean auid V2u:iability measures 
for both groups on all subtests is a serious 
weakness of this study. (In a later study 
the investigator provides such data.) 

b) We are told that there was almost unan- 
imous agreement on the classification of the 
42 items actually included in the criterion 
test. We are not told, however, the reasons 
for excluding 17 or the original 59 items 

and we can only assume that at least some of 
them were eliminated because the judges could 
not agree on the appropriate level. 

c) We remain skeptical especially that 
the higher level abilities of synthesis and 
evaluation were actually measured since it 
is roost difficult to devise multiple-choice 
questions which truly measure these skills. 
Further, since the instructional materials 
were different for the two conditions, it 
is possible that a single question could be 
measuring at different cognitive levels for 



the decision made to accept an 
item? On what basis were the 
17 items eliminated? 

c) Do you feel that each item 
measTired th^ level of cognitive 
ability that was intended? Why? 

d) How ixnportant is it that this 
classification task be done ac- 
curately? Explain. 

e) Publishers of tests used to 
making decisions about individ- 
uals often consider reliability 
indices of .90 or more as high 
(i.e., good) and indices of less^^ 
than .70 as poor. The inves- 
tigator seems to be unhappy with 
a reliability indjvx of .68. 

^hould he be? Why? 

\ - ■ 

Answer: 



ERLC 



HUNKINS -8 



€• cont'd 

different students because of this differen- 
tial-prior instruction. For example, suppose 
that a question and answer used in the instruc- 
tion under Condition A concerned the evalua- 
tion of a peirticulfir content area. A ques- 
tion in the criterion test asking for an 
evaluation of a similar area would not be as 
novel a task for students in Condition A, 
and thus for those students would not be 
measuring at this "highest" level (evaluation) 
but rather it would be measuring lower level 
skills. Just because an item contains the 
word "evaluate" does not mean that it will 
necessarily measure a student's evaluative 
ability. It is indeed unfortunate that no 
examples of questions from the instructional 
materials and the criterion test were shown 
as evidence that the investigators were able 
to overcome these difficulties. 

It is, of course, important that competent 
judges be used to classify the items. One 
point made above is that proper classification 
requires more than competent judges « The 
items cannot be classified accurately into the 
categories employed in this study withr at 
knowledge of the students' prior isisst: tion, 

d) Had subtest scores been reported, as 
we suggested they should be in our £mswer to 
6a, then the correct assignment of item to 
taxonomy category would have been very impor- 
tant indeed. Since only the total score was 
reported and the same criterion test was given 
to both groups, it probadDly wasn't important 
that the six categories be equally represen- 
ted and ^ome classification mistakes certainly 
could be tolerated. 

e) This is a technical question, the 
answer to which you are not necessarily ex- 
pected to know. Our answer is NO. A high 
reliability coefficient is not required for 
a criterion measure in a research study com- 
paring groups. Here's why. High reliability 
assures us that' differences in tectre^cores 
are not due to measurement err^r. Unless 
a test has high reliability, then differences 
in an individual's test scores (used to measure i 



ERIC 



HUNKINS -9 



6. cont'd 

"gain" or to determine his relative strong 
areas) may be due to measurement error. In 
a research study comp£u:ing groups, what are 
being ccnipared are not the differences between 
two individual scores but rather the differ- 
ences in means, each based on the scores of 
many individuals. Although any one score 
may have measurement error, - the "too high" and' 
"too low" errors will balau^ce out over, many 
people, leaving us quite confident that this 
mean score is fairly free of measurement 
error. That is why we can tolerattfi a lower 
reliability in the measures we use in research 
studies. The value of .68 reported by the in- 
vestigator is quite acceptcd>le. 



Reread the section on Experimental Mater- 
ial and Procedure on pages 328 and 329. 

a) At the top of page 329, the investi- 
gator indicated that it was important that 
the unit to be studied be one aibout which 
the subjects did not have "... abundant 
prior knowledge." Do you agree? Give 
reasons for your answer. 

b} - ^te that 47.53% of the questions used 
in Condition A were in the analysis and 
evaluation categories; in Condition B 
87.38% of the questions were in the know 
ledge category. Alternatively, the 



Reread the section on Experimen*^ 
tal Material and Procedure on 
pages 328 and 329. 

a) At the top of page 329, the 
investigator indicated that it 
was important that the unit to 
be studied be one about which 
the siibjects did not have "... 
abundant prior knowledge." Do 
you agree? Give reasons for your 
answer. 

b) Note that 47.53% of the qles- 
tions used in Condition A were 
in the analysis and evaluation 
categories; in Condition B 87.38% 
of the questions were in the 



** Reliability mearfs consistency of measurement. 
If a test consistently gives systematic errors, 
i.e.^ errors which are consistently too high 
or consistently too low, then we say the test 
is invalid, but still it can be reliable. Un- 
?:^liability occurs because of random measure- 
ment errors, which, if averaged over enough 
people (or over many test items) will balance 
out. Meams of many scores (or very long tests) 
are usually very reliable. 



ERIC 



HUNKINS - 



10 



7. cont'd 

investigator could have had all the ques- 
tions in Condition a in the analysis and 
eval\iation categories and all the ques- 
tions in Condition B in the knowledge 
category • Would this have been an improve- 
ment? Why? 

c) Was the readability analysis a wise 
thing to do? Explain, 

d) Is it possible to compare the content 
of the questions and answers used in the 
instruction to the criterion test ques- 
tions? If not, is this inability a serious 
shortcoming? Give reasons. 

y / a) Yes, we feel it was wise for the inves- 
tigator to select a topic about which the 
students did not have abundant knowledge/ Our 
reason is that using such a topic insured that 
the question and answering procedures v^buld 

^ have a chance , to make a difference bec^ause 
there were still mauiy things the students 
could learn. In other words, if students 
^ already knew a great deal about a topic leav- 
ing little to learn before the study began, 
then oi^ procedure of instruction could not 
be expected to result in more learning than 
the other procedure. A second reason is that 
when students have different levels of prior 
knowledge about a subject, it is difficult to 
construct items which will measure at the 
same cognitive level for all students. (Recall 
our answer to question 6c.) 

Mcjny students answered YES for a reason 
different fifom the ones we gave above. They 
ffeit thac the students in the study with prior 
knowledge would have ar unfair advfintage and 
another extraneous factor would be introduced. 
This would certainly be true, but it should 
not be a cause for concern unless it is sus- 
pected that the students in one of the two 
groups had, on the ^'aye rage, more prior know- 
ledge than the other group. 



knowledge category. Alter- 
natively, the investigator 
coul4 have had all the ques- 
tions in Condition A in the 
analysis and evaluation cate- 
goeies and all of the questions 
in Condition B in the knowledge 
category. Would this have been 
ah improvement? Why? 

/'^ 

c) Was the readability' analysis 
\- a wise thing to do? Explain. 

d) Is it possible to compare the 
content of the questions and aai- 
swers used in the instruction 

to the criterion test questions? 
If not, is this inability a ser- 
ious shortcoming? Give reasons* 

Answer : 



HUNKINS -11 



7. cont'd 

Other students answered question 7a NO 
and remarked that knowledge of some facts is 
important for without such knowledge the stu- 
dents in the study could not be expected to 
analyze and evaluate. This is true, but the 
issue is not whether the students in the study 
should have this information (they should) 
but whether they should be given this infor* 
mation be fore they are exposed to the instruc- 
tiona}(^materials . 

b) Before giving our answer to Question 
7b, note that the kinds-of-questions-asked 
represents, in this study, the variable which 
is under the control of the investigator — the 
variable being\manipulated. In appraising 
the work of others, pay particular attention 
to the levels or conditions which are being 
used. The results depend upon it. 

If it ifs true that asking higher level 
questions really makes a difference, tlien the 
alternative distribution proposed involving 
100% or 0% in a given category, would give 
the investigator the best chance to discover 
differences between conditions. As one stu- 
dent put it, to do otherwise would, "...water 
down the eff?^ctiveness. . . of the experimen- 
tal treatment. 

On the other hand, the ratios of the dif- 
ferent kinds of questions the investigator 
chose to compare ar3 more typical of what cne 
would expect to find in existing materials 
(or in the questioning patterns of teachers) 
and what one would hope to find in materials 
that emphasized higher level questions. We 
personally approve of the investigator's de-* 
cesion to make the b^^lance of question types 
more closely resemble present and sound prac- 
tices rather than to use conditions as dif- 
ferent as possible. Clearly, use of either 
distribution is justified. 

c) Yes, to conduct a readability analysis 
was a wise decision, although reporting read- 
ability data separately for the two groups 
would have been preferable. The readability 
analysis would have been important had the 



ERLC 



7. conf d 

investigator failed to find differences in 
favor of Condition A. If that had occurred, 
one explanation for finding no differences, 
namely that the reading materials were too 
hard,. could be ruled out by the fact that 
the isean steading level was within the range 
of fifth and sixth grade pupils. As one 
person answring question 7c put it, the read- 
ability analysis, "...knocked out the possi- 
bility of inass;Lve inability to conprehend tho 
questions.*' 

Further, if Condition A students:; had aot 
done better than the other students, we mighti 
have wondered if the higher level questions 
and answers were more difficult to read. The 
plausibility of this explanation could be 
assessed by having readability figures shown 
separately for materials used in Conditions 
A and B. 

^^otice that a reading difficulty index 
was confuted for the £uiswers as well as for 
the questions. This alerts us to ^e fact 
that the answers are, in all ^probability , 
more than cryptic responses and that by 
providing answers, additional instruction 
roust have b'^en given. The implication of 
this will be evident in the answer to the 
next question. 

d) No, we are not told how similar the 
questions asked in the criterion test were to 
the question., and answers given in tlie instruc- 
tion. This omission Is probably the most 
serious shortcoming of the study. We know 
only that during instruction different ques- 
tions and answers were given to the two dif- 
ferent groups. It is extremely difficult, if 
not impossible , to choose criterion test 
items upon which these differences in the 
instructional materials had no bearing. The 
fact the investigator does not mention this 
problem and report in detail how it was circum- 
vented is a serious weakness. It suggests 
that the/<nf £e recces between the two condi- 
tions oould be accounted for entirely by the 
diff^^nt content of the instructional mater- 
ials (and especially in the answers provided 



ERLC 



HUNKINS -13 



7. coL*"d V 

irfiich were adnittedly more coiT:plicated Ip. 330, 
column 11 in the case of the evaluation and 
analysis questions than tlio knowledge ones) 
rather than by -the practice^ of answering anal- 
ysis and evaluation questions alone. 



8. Reread tJie section, Analysis of Data, page 
329. Do you approve of using sex reading 
achievement, as adaitional V2u:iables in 
the analysis? Why? 

Yes, inclusion of these variables helps 
to determine the generalizability of the find 
ings. Because there was presum2j3ly little 
interaction between these variables and the 
treattnent variable, it means that the differ- 
ences between the scores under the two condi- 
tions seemed to be about the same for both 
sexes and across the readiniCf groups. If, for 
example, the higher level questions and answers 
were relatively less effective with poor 
readers this would show up as an interaction 
between treatment and reading. By including 
reading and sex as variables in the analysis 
the investigator could identify the limits 
jto the generalizability of the results in 
these respects. 



It was also important to include these 
two variables because they were explicity men" 
tioned in the statement of the hypothesis of 
the study (p. 327) ^nd, thus, a complete test 
of this hypothesis requires their inclusion. 

Borne stvudents made a good case in sxapport 
of inclusion of one of tho two variables. A 
sepaurate analysis by sex was deemed necessary 
because of the difference in boy /girl ratios 
in the two treatment conditions. Others 
ugued that the reading variable was very im- 
portant to include because of the suspected 
relationship between reading and performance 
on the criterion task. 



8. Reread the section. Analysis of 
Data, page 329. Do you approve 
of using sex and reading achieve- 
ment as additional variedjles in 
the analysis? Why? 

Answer : 



ERLC 



HUNKINS -14 



9. Reread the results section on puges 329 
and 330, including Table 1. (You may 
wish to review the Special Notes regard- 
ing the interpr^. tation of Table 1) . 

a) Note that it th: v^ry bottom of page 

320 tiic mvtrc tig. -iter refers to pre-achieve- 
ment scores. Did he evjr report how thesgi 
scores were obtained? Regardless whether 
or lioc he did ^o, dc you think it was a 
good idea to obtain such scorec and once 
obtained, should they have been used? 
Why? 

b) Find the number 9.85 in the F column 
of Table 1. Iz the difference in roeam 

^^^cores for students in the two treatment^ 
groups statistically significant? Here's 
a question you may not knov// the answer to. 
VoQS the number 9.65, by itself, indicate 
which treatment group performed better? 

c) Does th<^ investigator ever indicate tho 
numerical valvse of the differences in 
pean scores for students in the two treat- 
in3nt conditions? If so, what is it? If 
not, should he have done so? 

a) The nature of the pre^achievement scores 
was not specified. They could have been pre- 
vious achievement grades in social studies. 
They could have been scores on the criterion 
test administered b jfore the textbook and 
special rR<\teriaJs were used, if the latter 
is the case, there is slight danger that 
seeing the critericn test ahead of time would 
be of greater hcJp to students in one condi* 
tion than in the othur. The investigator did 
n*entio i on a^^xo 328, however, that the read- 
ing and I. Q. scores were not used as th<^ 
covariates; that is, ^'hey were not used as 
the pretest. 



Reread the results section on 
pagec 329 and 330, including 
Table 1, (You may wish to re- 
view the Sjpecial Notes regard* 
ing the interpretation of Table 
!•) 

a) Note that at the very bottom 
of page 329 the investigator 
refers to pre-achievement scores « 
Did he ever report how these 
scores were obtained? Regard- 
less whether or not he did so, 
do you think it was a good idea 
to obtain such scores and onrc 
obtained f should they have been 
used? Why? 

b) Find the number 9.85 in the 
F column of Table 1. Is the 
difference in mean scores for 
students in the two treatment 
gro:;ps statistically signifi- 
cant? Here*s a technical <;ues* 
tion V '^^ niay not know the answer 
to. Does the number 9.85, by 
itself, inlicate which treat- 
ment group performed better? 

c) Does the investigator ever 
indicate the numberical value 

of the differences in mean scores 
for students in the two treat- 
ment conditions? If so, what 
is it? If not, sihould he have 
done so? 



Answer : 



Once the pre-achievement scores were 
obtained f it was a good idea to adjust the 
criterion scores r>n the basis of differences 
on the p^o-achieveinent scores not only to 
equate the groups, but (as mentioned in 
cuiswer to Question 5c) to give greater power 




HUNKINS 



-15 



9. cont'd 

to the analysis. We find it strange that 
the investigator did not te.ll us how the two 
groups differed on these pre-achievement var- 
iables or describe them more clearly. 

b) This F number, as indicated by the 
**footnote, signifies that the n^eans for the 
tyfo treatment groups are significantly dif-* 
ferent. The so-called F test, however, does 
not indicate which group scored higher but 
only that the differences could not reason- 
ably be accounted for by chance alone. We 
have to look at the njean values to find out 
which group did better. In this report, we 
must rely on the statement in the text that 
Condition A pupils performed better. 

c) A surprising deficiency is the failure 
to report the criterion means lor the two 
groups. We don't know if the difference in 
means is large or small. To find statistical 
significance in mean differences is only the 
initial step in a proper interpretation of a 
research study. If the difference were as 
much as 1/2 a standard deviation (a varia- 
bility measure like the standard deviation 
should also have been reported) the Gifference 
would have important practical implications; 
if the difference were only 1/lOOth of a 
standard deviation, even though the difference 
was statistically reliable, it would lack 
much practical significance. The magnitude 
of the differences should definitely havt 
been given. 

♦♦Note that the number 10.05 in Table 1 is 
not the difference in group means. Rather, 
it is the result of an intermediate calcu- 
lation in the analysis of covariance. 



Reread the discussion section, column 1, 
page 331. The investigator indicates that 
this study suggests the following:' that 
questions requiring analysis And evalua- 
tion, ".'^..stimulated individuals to util- 
ize general viewpoints regarding the 
information embedded in the task.**; forced 
"mental juggling" of the materials; led 



10. Reread the discussion section, 
column 1, page 331. The inves- 
tigator indicates chat this 
study suggests the following: 
that questions requiring anal* 
ysis aind evaluation, "...stim* 
ulated individuals utilize 
gener till viewpoints regarding the 



ERLC 



HUNKINS -16 



10* cont'd 

to greater ".•.interaction with the mater- 
ials presented/' and have the potential, 
•••..to make piipils uneasy.** What evi- 
dence supports these suggestions? 

None that we know of. It is true that 
students given the greater proportion of 
*'analy8is** and •^evaluation" questions and 
answers perfon&ed better on the criterion 
test* But the study was not designed to deter- 
mine how this superior performance came about. 
The statements of the investigator quoted by 
us in Question 10 represent admitted guesses 
on his part of chamges occurring inside the 
student rather than assertions based on re* 
ported evidence. It is quite acceptable for 
kn investigator to report his speculations as 
/long as they are clearly lai>elea so that the 
reader can recognize them as unsupported views. 



information csnbedded in the 
task."? forced "mental juggling"* 
of the materials; led to greater, 
•'...interaction with the materials 
presented," and have the poten- 
tial, ••...to make pupils \ineasy." 
What evidence supports these 
suggestions? 



Answer: 



1. Assume that the research were redone so 
as to overcome the criticisms mentioned 
earlier and that similar findings in 
favor of Condition A resulted. What 
limitations would still remain to this 
single study which would prevent one 
from generalizing with confidence that 
questions of higher cognitive levels 
generally stimulate higher achievement? 

The study investigates one topic, in 
one subject area* for students in one grade, 
frcoi one surburban school system. Further, 
it is limited to written self -instructional 
materials and we don't know if the findings 
would hold up for the situation in which 
teachers ask the same questions. Further, 
only achievement iimiediately after study was 
measured. Of more importaince is the long- 
term impact as measured by a delayed post- 
test. A single study cannot have universal 
applicability. This study did look at both 
sex§s and various reading levels. We do not 
fault the investigator for not including more 
topics and delayed post-testing, etc. V^e 
only mention these "extensions" to alert 
you to those situations to which the results 
might not apply. 



11. Assume that the research were 

redone so u to overooiae the crit* 
icisms mentioned earlier and 
that similar fit;<3ings in favor 
of Condition A resulted. What 
limitations would still remain to 
this single study which %/ould 
prevent one from generalizing 
with confidence that questions 
of higher cognitive levels gen- 
erally stimulate higher achieve- 
ment. 

Note: If you are interested in 
reading a review of the research 
on the effect of questions on 
learning, see the December 1970 
issue of the Review of Educational 
Research . 

Answer : 



Appendix VI 
Dolores Durkin 



Qiildrcn's Concepts of Justice: A Caiiparison 
;itli the Piaget Data 



Giild DevelopKient , 1959, 30, 59-67 



CHILDREN'S CONCEPTS OF JUSTICE: A COMPARISON 
WITH THE PIAGET DATA 

Dolores Durkip 
Child Development , 1959, 30, 59-67 

QUESTIONS 

1. Appraise the educational slai^lflcance of the study. In this con- 
text, by educational significance we mean the import for those 
responsible for the education of children and vrhaic they might do 
as a consequence of the assertions established by the study. 

2. Give your critical appraisal pro and con concerning how well the Invest- 
igator has accomplished the first named principal purpose for the study 
(as Indicated In the Special Notes). Evaluate the adequacy of the 
design (subjects, Intervlev; procedure), appropriateness of the analyses 
(categorization scheme, statistical tests), and the validity of the 
interpretations and conclusions (of both the Rampert and the present 
studies). 

3. Note the second purpose for the study (as Indicated in the Special 
Notes). Briefly evaluate how well this purpose has been accomolished. 
Pay particular attention to the investigator's notion of intelligence. 



ERIC 



CHILDREN'S CONCEPTS OF JUSTICE: A COflPARISOH 
WITH THE PIAGET DATA 



Dolores Durklri 
Child Development, T959, 30, 59-67 
Sne Ja^ Notes 



Introductory Section : 

The investigator has indicated tv/o principal purrases for the 
present study. The primary purpose is, usinq American children, to test 
Piaqet's empirical claims that children up to about 8 years of aqe typically 
appeal to adults to redress wrono and to provide appropriate punishment; 
that from 8 to 11 they shift to an equalitarian notion of justice char- 
acterized by reciprocity (an eye for an eye): and that from 11 or 12 
onwtird they associate red orocity with equity (retribution takes account 
of circunostances). 

A second purpose of the study is to investiqate whether intelli- 
gence, rather than chronological aqe, is the significant factor in the 
developmer^t of a child's concepts of justice. 



Descriptiofi of Piaget's Study : 

A •:horough critical aporaisal of a research report requires 
familiarity with the context out of which the study comes. This Is especially 
true of the present study v/hich has as its focus a comparison with the 
research results from another investigation. The experiment being repli* 
cated was actually conducted by Mile Rambert, and reported* by Piaget" in his 
1932 book. The Moral Judgment of the Child . The results of that study which 
most relate to the Durkin paner may be found on page 302 of the Piaget book 
and are shown below. 



PERCENT OF GIRLS AND BOYS RESPONDING IN VARIOUS 
CATEGORIES TO THE QUESTION, "IF ANYONE PUNCHES YOU. WHAT DO YOU DO?" 

N = 167 





"It is naughty" Give back the same 


Give back 


more 


Give beck less 


Age 


Girls Boys 


Girls Boys 


Girls 


Boys 


> 

Giirls Boys 


6 


82 


. 50 


18 


37.5 




12.5 -- 




7 


45 


27 


45 


27 


10 


46 




8 


25 


45 


42 


22 


8 


33 


25 


9 


14 


29 


29 


57 




14 


57 


10 




8 


20 


54 




31 


80 7 


n 






33 


31 




31 


67 38 


12 






22 


67 




10 


78 23 



ERIC 



The table can be read as follows: 82% of the responses of girls, 
aqe 6, were cateqorized as "it is nauqhty' ; the remaininq 18t were placed 
in the category, give back the sesne. The children did not say "it is naughty" 
as a direct response to the question, "what do you do?" It is only when 
asked additional questions such as, "do you hit back?", that the child 
might respond, "it is naughty/' Some children did say they would tell 
someone in authority as their first response to the "what do you do?" ques- 
tion. In such cases the determination of which o^f the four categories to 
use depended upon the children's replies to further questions. 

On the basis of the above data and the transcriptions of the 
complete interviews, Pi age t concludes that: 

...the children who do not hit back (most of them are from 
the younger ones), are primarily submissive children who rely 
upon the adult to protect them and who are more anxious to 
resnect or rrake others respect the orders that have been 
received than to establish justice and equality by methods 
appropriate to child society. As for the children who Tiit 
back, they are far more concerned vfith justice and equality 
than with revenge properly so called. . .Among those who give 
back more blov>/s than they receive there is, of course, 
a combative attitude, which goes beyond mere equality: but it 
is precisely this attitude v/hich diminishes with age. (p. 305) 



Statistical Analysis of Durkin's Study 

The author makes extensive use of chi square statistical procedures to 
test several hypotheses. In each one, the investigator is comparing the observed 
frequencies of responses to those expected on the basis of a chance distribution 
of re:iponses or those expected if the variables of interest were not related. 
When the discrepancies between observed and e^j^pected frequencies are large (as 
evidenced by a large liumerical value for the chi sguare statistic) the 
hypothesis of chance distribution or no relation is rejected and support is 
evidenced for a nonchance, or statistically significant relationship. The end 
product of a chi square test is a probability (g^) of obtaining discrepancies 
between observed and expected frequencies as large or larger than those found 
in the study. A small probability of getting such large discrepancies (in this 
study, small is 5% or less) is the criterion for rejecting the chance relation- 
ship hypothesis and for claiming statistical signifii^ce. 

Page 62. last paragraph . This paragraph describes the results of 
three chi square tests. The set-up for the test for nrade two is shown below. 



Authori ty Agression Other 

Observed 



Expected 



ERLC 



15 


8 


5 


(9 1/3) 


(9 1/3) 


9 1/3) 



the nunbers 15, 8, and 5 are taken from Table 1: the numbers in parentheses 
(9 1/3) are the expected frequencies. Under the chance distribution hypothe- 
sis, the 28 second qraders ^iould be expected to qive responses in the three 
categories equally, or 28/3 or 9 1/3, responses in each- The end product 
for this 2nd qrade significance test is reported at the end of the para- 
graph as ".05< p <.10". The symbol , <^ means "less than". Thus, the dis- 
crepancies between the actual frequencies and 5 1/3 could be expected to 
happen by chance alone between 5 and iO% of the time. By the S% criterion, 
these discrepancies were not statistically significant. 



Page 63, first paragraph. Here is described t^e second use cf the 
chi square test. The set-up is shown below. The expected frequencies 
under the hypothesis of no relationship between age and kind of response 
are shown in parentheses. 



Tell 
Authority 


2 


Grade 
5 


8 


15 (15.2) 


13 (20.7) 


27 (19.1) 


Return 
Agression 


8 (7.5) 


15 (10.2) 


4 (9.4) 


Other 


5 (5.3) 


10 (7.1) 


4 (6.6) 



28 



38 



35 



55 

27 
19 
101 



ho relationship means that che ratio of 2nd, 5th and 3th graders giving each 
of the three kinds of reasons will bo the same. The discrepancy between 
observed and expected frequencies is very unlikely if there were no relation- 
ship in some hypothetical larqer ponulation as evidenced by the probability 
figure between 1/2 of one percent (.005) and 1% (.010). , The no relation 
hypothesis is thus rejected and the results are statistically significant. 

Page 65, footnote 3 . The probabilities computed from a chi 
square test are only approximate. When the expected frequencies are not 
too small (say, all more than 5) the approximation is extremely good. In 
goodnote 3 the investigator is saying that when the responses were spread 
over more categories, the theoretical (i.e., expected) frequencies for these 
categories were too small to permit accurate estimates of the desired 
probabilities. 



ERLC 



CRITIQUE 



DURKIM, DOLORES, "Children's Concepts of Justice: A Comparison with the 
Pi age t Data," Child Development , 1959, 30, 59-67 • 



Ed ucational Significance 

on 

The educational sinnificance of a study dealino with children's 
concepts of justice should be clearly evident. Elementary school years are 
usually seen as a time when children learn to cope with aqression by learning 
standards of fairness to apply to interpersonal conflicts. Elementary school 
teachers are expected to be able to understand these conflicts and to aid 
pupils in developinq appropriate standards of conduct* A commonly-held 
assumption in teacher education as well as developmental psychology i^ that 
studies of child development help determine teachers' expectations of( 
children of different ages and abilities.^ It could therefore be easily 
assumed that studies such as Ourkin's would have educational significance. 

One student reader asked, ''How can we focus on ways to teach child- 
rep until we first focus on children?" This question assumes that descriptive 
studies of what is^ the case sets limits on what teachers ought to do, Thus» 
if descriptive studies show that, say, eiqht-year-olds have not yet developed 
certain troral concepts of, say, autonomy, then teachers should not try to 
teach them these new ideas. In one sense, of course, a child must crawl 
before he can walk. Some things do come before other things in the develop- 
ment of a child. The assumption of the stages of development underlies much 
of Piaget's work. But information about what children "naturally" learn 
in the course of their development does not overlap completely with what 
tbey might learn under conditions of schooling. Teachers intervene in 
"natural" development. Tnus, we see that the significance of the Durkin study 
(and others like it) for teachers is considerably less than what we might 
at first suppose. The Durkin study does not give evidence dr advice to"* 
teachers about what they might (positively) do with children in teaching 
them about proper responses to actual or threatened physical agression. 

After reading paragraph two, one student reader complained that 
it is wrong to criticize the author for failing to study the teaching of moral 
concepts. After all, "Isn't it unfair to criticize a work^for something the 
author did not intend?" It is unfair, or rather, inanpropfiate, to confuse 
a criticism of the author's intentions with a criticism of other points. 
Clearly any work has an audience beyond the audience for which the author 
intended the work. That the author may not have intended the Work for 
teachers does not mean that the work cannot be criticized from the point 
of view of teachers and teaching. 

Another student reader defended the significance of the study 
another v/ay. She wrote: "Piaget has proposed a theory in child develop- 
ment which is quite well known. This study has educational significance 
because Piaget has .earned the attention of educators. Any attempt tc expand 
upon or reconfirm his findings^is important." 



/ 



^2- 



5 Ite agree in part. We commend the attempt to rework ideas in one 
culture (American) which have earned recognition in another culture. 
Because of contextual considerations, studies done in one place need to 

be redone In another place if they are to be utilized there with confidence, 
Heverthelfrss , just because a study related to Piaget*s work does not auto- 
matically confer significance on It. Everythinq depends upon what the stucfv 
asserts about the significant phenomena of Interest, 

6 A reader who thought the study lacked educational signiflcsnce 
wrote as follows: "This study reports the obvious, namely that older 
children grasp conceptual complexities younger ones don't. Educators do 
not need this point demonstrated." There are some not-so-obvious things 
to say aboijt the obvious. 

7 The obvious is usually the conventional, when it comes to 
educational matters, and the conventional usually has aspects of both 
right and i^rong. 

8 Secondly V what seems obvious at the end of an inquiry might not 
have been so obvious at the beginning. Presumably any inquiry is an atternpt 
to find something out that is at least somewhat in doubt. If we really 
knew for certain in the beginninn of an inquiry what we wanted to know by 
the inguiry,^ then it Is not likely that we would undertake the study. 

9 Consider this table: 



BEGINNING OF INQUIRY* 


END OF INQUIRY 


RANK IN 
' SIGNIFICANCE 


Conventional Wisdom 


Conventional Hisdom Rea^fimed 


4th 


Conventional Wisdom^-- 


Surprisinn (new) Results 


1st 


Puzzling Pheriomena 


Conventional Hisdom Reaffirmed 


3rd 


Puzzlinq Phenomena 


Surprising (new) Results 


2nd 



♦Research does not necessarily have to begin at either of these 
tMO* starting points. It can begin in theory, for example. This table reflects 
only one way to look at the question about reseaiech into the obvious. 



ERIC 



-3* 



10 Our slqnlflcance ranklnqs are arquable, of course. Nevertheless * 
to take the first case, simply to reaffirm what everyone already knows Is 
perhaps of only mild interest to either practitioners or researchers. To 
obtain the hiqhest rankinq of the four possibilities one must beqin with 
the conventional and hope to find out something which is surprising to 
both practitioners and researchers , The third case ranks fair, in our 
Judgpnentt because it begins with something puzzlinq and reaffims a 
portion of the ambiguous conventional wisdom*, we now wnow which horse to 
back in the ordinary races of the day. The fourth case ranks high because 
we find out something we did not know about something which had been puzzling 
us. 

11 Thus, although we agree with the reader mentioned in paragraph six 
that the findings are what one would expect and therefore the study has 
lisiited educational significance, at the same time we support occasional 
*'h1gh risk" studies because from such investigations high levels of educa-* 
tional significance can result. 

12. The educational significance of the research probl^n is discussed 

In paragraphs 1-5 and, in paragranhs 6-1 K Consideration 1$ given to the 
significance of the research findings . One can also assess the significance 
of research as condpcted and ask whether the actual studly contributes signi- 
ficantly to the solution of the research problem, as defined by the investi- 
gator. 

13 Metaphorically we may say that any inquiry is only a beam of 

light on a vast, clouded and perhaps dark area of interest. No beam of 
light will illuninate the whole area; thus, any single study has to be 
less than comprehensive. Granted the. necessity to limit any study, special 
care iwjst be taken to see that the actual study is not too small (only 
a pinhole rather than a beam of light). For reasons discussed above and 
subsequently* we think the Durkin article is more like a pinhole than a 
beam of light In Its educational significance. 



Question 2: Purpose 1 

Adequacy of the Design . 



\/1flOS 



14 Subjects . Since all the subject children came from the same 

cortiDunlty and school, the investigator may safely claim that the diffjerences 
in responses observed are not the result of broad environmental differences. 
Since the enviror-mental factors have been eguated, the (^served differences 
^st likely reflect age differences although some underlying factor (such 
/as Intelligence) cannot be ruled out as at least partially responsible. 

^ V 

15> \ l^atever its advantaoes, the u§e of such a small and homogeneous 

sample makes generalization to other U. S. school children hazardous. Age 
differences in responses for this single schpol may not be similar to those 
one» would find in other schools or other communities in the United States. 
For/example, we do not know the "rules*' of the school and teachers regarding 
fiohting and other forms of agression, rules which might have a disproportionate 
ifffluence on the children's responses. 

ERIC 



-4- 



16. The Investlqator wishes also to test for cultural differences by 
coqnnarlnq the resoonses of her samnle of children with those made by 

the children used In the study reoorted by Plaaet. The latter were des- 
cribed as comlnq '*from the poorer oarts of Geneva (Switzerland).*" At least 
five characteristics of the samole as described prevent an adequate comparison 
for these cultural differences: a) the two samnles (from Placet, from 
present study) were selected rouahly 30 years apart and thus time as well as 
cultural determinants are Involved; b) the samnle used In the present 
study Is a homogeneous one and may not be representative of American culture: 
c) the sample used In the present study Is from a rural community whereas 
the earlier study employed urban (Geneva) children- -thus the differences 
may not be strictly cultural: d) the oldest children used In the sample 
reported by Plaqet were 12, whereas in the oresent study the oldest qroup 
had a mean aqe of aKmost,14— thus differences in response could reflect 
aqe differences- e) althounh residents are placed as "poor," "averaqe," or 
•'rich," we do not know how, for example* the resources of the family of an 
"averaqe" child of the present study compared with the resources of the family 
of a child used in the study reported by Piaqet, 

17. The fact that the differences in samole may be multl^dimensional 
and not purely cultural. Is probably not as serious a weakness as the 
above discussion would lead you to believe. One aspect of Piaqet*s 
moral theory is that the channinq responses reflect a basically oenetic 
development. Thus, if the investiqator notes substantial discrepancies 

In the responses of the t**K) samples of children, the qenetic dev elonment 
position is weakened--a findlna of some theoretical Importance. 

18. Interview Procedures . Differences between the two samples are to 
be expected because a purpose of the study was to replicate the findlnqs of 
Piaqet with a different type of child. But differences between the studies 
In the intervie\< procedures clearly make any valid comparison between the two 
studies unlikely and represent a serious weakness of the study. 

19. One modification the oresent investiqator made in the Interview 
procedure is In the initial question asked. The investiqator asked the 
child what should Vann do. In the Rapbert study reported by Planet, the 
child himself was asked v/hat he does do . Not only is the question more 
detached than the other because it Involves tivo fictional children, but should do 
is substituted for does do > These modifications miqht mke a considerable 
difference In the responses. (Once ttie investiqator decided upon fictional 
names, the use of rarely used names like Vann and Bennett was a qood strateqy 
since It reduced the likelihood that the responses v>auld systematically 

be affected by the characteristics of a real Vann or Bennett known to the 
children. He are assuminn the names Vann ^nd Bemett are rare In this 
context.) 

20. A second modification in the proce<Aure Is the avoidance of the 
••clinical method.'* Recall in the special notes that. In the earlier study, 
much Inquiry took place after the ch11d*s ^initial response. Indeed, It 
was the responses to these later, more orobinq questions which determined 



ERLC 



-5- 



the response cateaory for the child. The present investioator restricted 
herself to a sinnle question, except in one case when an eye-for-an-eye 
response did prompt her to provide a foTlow-uo question. 

21 • Anyone who tries to replicate a Pianetian experiment faces a real 

problem because Piaqet's experimenters arc trained in the "clinical 
method'* and routinely conduct a short inquiry into the meaninq of a child's 
answers to any standardized question. In many cases not even the initial 
question is standardized, A nood case in ooint is the study beinq repli- 
cated by the investiqator. 

22. As w noted, the investinator has modified the procedures, no doubt 
in the interest of objective reportinq and to avoid the "clinical method," 
This modification, however, v;as made at the nrice of losinq any valid basis 
for comparison with the earlier study. It is unfortunate that the investi- 
gator does not mention any of these problems, but instead conveys the 
Impression that she merely replicated the earlier exoerirnent using a differ- 
ent cultural qroun. 

23, Somewhat parenthetically, we miqht add that had the investiqator 
not wanted merely to renlicate the earlier work, but rather had wanted to 
study the development of the concept of justice in the best way possible, 
then other procedures of qatherinq data should obviously have been consi- 
dered. For example, what a child says he would do when speaking to an 
adult may be quite different from his actual behavior. Some check for 
such a discrepancy mioht be included. Further, the sinqle question focuses 
on too narrow a ranne of the factors involved in makinq a moral judgment 
and concerns itself with only a sinqle aspect of justice. Other factors 
might be investigated. However, because the purnose of the present study 
was replication, we do not fault the investiqator for not including such 
extensions in her data qatherinn nrnredures. 



Appropriateness of the Analyses . 

24, Categorization of the Resnonses . Even a cursory qlance at the 
special notes accompanying the article will reveal that the investigator 

used a catenorization scheme different from that employed in the earlier study. 
This permits at best only a rounh coTiparison of the data from the two studies, 

25, Once the decision was mad3 to dron Rambert's classification system 
(presumably for a systera believed to be more objective), it seems to us that 
a finer ar^d more productive set of categories could have been established. 
For example, we believe chat th^ tol Towing categories would have been pre- 
ferable: a) tell authority • b) retaliate, c) conflict, with resolution in 
direction of telling authority, and d) conflict, with resolution in direction 
of retaliation, 

26, Finally, we note that the interviews were tape-recorded, Ite 
wonder why individuals ignorant of the respondent's age were not used to 
categorize the responses. Such a procedure would quard against at least 
one form of experimenter bias. 



ERLC 



•6- 



27. 



28. 



Statistical Tests . Although admittedly a minor noint, we fail to 
see the purpose of the statistical test merttioned at the bottom of pane 62 
and interpreted in some detail in the special notes. The question beinq 
tested Is whether, for a qiven nrade» the responses^ tend to pile up more 
in one er two cateqories than lyould be expected if the probability of assiqn- 
roent to the three cateqories were equal. Failure to find significance (as 
was the case for 2 of the 3 qrades) could be interpreted to mean that there 
is no model response for a qrade level and that it v/ould be misleadinq to 
say that such and such a grade level child is characterized as beinq in one 
or more cateqories. But such an interpretation w^s not qi en by the investi- 
gator. ^ 

A more serious problem with the statistical analysis relaties to the 
Chi square analysis. The chi square test measures only the discrepancy 
between actual and expected frequencies and does not take into account the 
order or pattern in which the discrepancies occur. Thus, for each Wpothe- 
tical layout shown below, the chi square test will qive identical results 
even thouqh the direction of the effects is different. As before, expected 
frequencies are in parentheses. 



2 


Layout (a) 


8 


4 (6) 


6 (6) 


8 (6) 




Layout (c) 


R 


6 (6) 


4 (6) 


8 (6) 



payout (bj^ 



8 (6) 


6 (6) 


4 (6) 


2 


l^ayout (^) 


8 (6) 


4 (6) 


6 (6) 



29, 



Since the investigator is hypothesizing a specific direction or trend (linear 
with aqe) and, after seeinq the data a curvilinear trend, statistical tests 
which would be more powerful in detecting such trends should have been used. 
In other words, the statistical analyses employed should match the research 
question. 

Finally, we wonder why the investigator: a) did not analyze her 
results by sex in view of the sex differences evident in Rambert*s data: and 
b) went to the trouble to place each child into one of three categories of 
economic status when no analysis was conducted by level of economic status. 
The analyses performed by the investigator are not incorrect; they are merely 
incomplete. 



Validity of the Interpretations and Conclusions . 

30. Interpretation of Rambert's Data. In sneakinq of the earlier study, 

the investigator writes: "They generally proposed two quite different solutions. 
Younger subjects favored reporting to an authority person; older subjects, 
a return of the aggression." (o. 59). Piaget never reported a specific 

ERIC 



-6. 

27. Statistical Tests , Althouqh admittedly a minor noint, we fail to 
see the purpose of the statistical test merttioned at the bottom of pa<ie 62 
and interpreted in some detail in the special notes^-^The question beinq 
tested is whether » for a oiven nrade, the responses tend to oil e up more 

in one or two categories than would be expected if the probability of assiqn- 
ment to the three cateqories were equfl. Failure to find siqnificance (as 
was the case for 2 of the 3 grades) could be interpreted to mean that there 
is no model response for a qrade level and that it would be misleadinq to 
say that such and such a grade level child is characterized as beinq in one 
or more categories. But such an interpretation was not qi en by the investi- 
gator. ^ 

28. A more serious problem with thp statistical analysis relates to the 
Chi square analysis. The chl square test measures only the discrepancy 
between actual and expected frequencies and does not take into account the 
order or pattern In which the discrepancies occur. Thus, for each hypothe- 
tical layout shown below, the chi square test will give identical results 
even thouah the direction of the effects is different. As before, expected 
frequencies are in parentheses. 



g layout (a) g 



4 (6) 


6 (6) 


8 (6) 




Layou^ (c) 


R 


6 (6) 


4 (6) 


B (6) 



payout (bj^ 



8 (6) 


6 (6) 


4 (6) 


2 


^yout (J) 


8 (6) 


4^ j[6) 


6 (6) 

-4- 



Since the investiqator Is hypothesizing a soecific direction or trend (linear 
with aqe) and, after secinn the data a curvilinear trend, statistical tests 
which would be more powerful in detectinq such trends should have been used. 
In other words, the statistical analyses employed should match the research 
question. 

29. Finally, we wonder why the investiqator: a) did not analyze her 

results by sex in view of the sex differences evident in Rambert's data*, and 
b) went to the trouble to place each child into one of three categories of 
economic status when no analysis was conducted by level of economic status. 
The analyses performed by the Investiqator are not incorrect; they are merely 
incomplete. 



Validity of the Interpretations and Conclusions . 

30. Interpretation of Rambert's Dat a. In speakinq of the earlier study, 

the 1 nves t1 <)a tor wri tesl "They generally proposed two quite different solutions. 
Ypynqer subjects favored reporting to an authority person; older subjects, 
a return of the aggression." (o. 59). Piaget never reported a specific 



-7. 



oercentaqe of subjects who replied that they would tell an authority, 
althounh in other contexts he does describe the early oeriod of the child's 
moral development as beinq marked by a submissive attitude to authority. 
So, althouqh it is not unreasonable for the investigator to suqqest the trend 
above, it is not an accurate rendering of Piaqet*s report of Rambert's 
experiment itself, 

31. On page 64 the investiqator quotes Piaget.as sayinq that, ••children 

maintain with a conviction that qrows with their years that it is strictly 
fair to give back the blows one has received," This statement, which is 
repeated in a different form in conclusion 1 {p, 66), is not an accurate 
suwnary of the Rambert data and the investiqator should have been more 
critical of Piaqet's interpretation. For qirls, this ontion of strictly equal 
retaliation declines with age in favor of under-retaliation (see the table 
in the special notes section). For boys, there is an increase in equal 
retaliation responses but\the trend is not at all clear. 



32. Interpretation of the !?ata from the Present Study . One objection 
to the data analyses in the presen1\study has already been mentioned: the 
Chi square test is inadequate to test the siqnificance of the linear hypothe* 
sis stemminn from Rambert's data and the curvilinear hypothesis suggested 

by the findinqs of the nresent study. 

33. One interesting findinq reported by the investigator is that 
although both 2nd qraders and 8th graders favor tellina an authority, their 
responses are not Identical— 8th qraders think more about it. It is too 
bad that the investiaator describes onlv a sinole interview to illustrate 
this difference. A different cateaory system, such as the one we proposed 
earlier, would have made oossible a more nenetrating and nrecis* set of 
conclusions. Except as noted above, conclusions 1 through 3, dealing with 
the first puroose indicated for the study, seem to us to follow from the 
data reported. 



Question 3: Purpose 2 

34. The investigator has given no reason for expecting intelligence 
to be related to the develooment of a concept pf justice. Since justice is 
a social concept, we suspect that a case could be made for expecting many 
other correlates of moral judnment development more worthy of investigation. 

35. Once the decision was made to relate intelligence to moral Judgment 
development, a mistake was made, we believe, in using I. Q. rather than mental 
age as the measure of Ini'elligence. I. Q. is a measure of rate at which the 
child is able to grow in -{nteiligence— about half the seconT^aders have 

a higher I. Q. score than the average eighth grader. But In terms of sheer 
amount of intelliqence, approximated by the measure of mental age, very few, 
if any, of the second qraders would surpass the averaqe elqhth grader. It 
isn't the "brightness" per se that is believed to be related to degree of 
development, but rather the amount that this bright child has learned. 



ERLC 



-8- 



36. 14hen analyses are conducted within a single qrade (thus using child- 

ren of approximately the same age), then I. 0. and mental ability will be 
verptilqhiy related and the choice of variable wllll make little difference. 
He^ suspect, howevert that when using several age groups simultaneously to 
test the relationship of Intelligence to the develooment of a justice 
concept » had mental age been used, a different result would have been found ^ 
and the ^'confllctlnn'* results referred to In conclusion 4 would be less likely. 
(Technical note: because of the markedly different standard deviations In 
I. Q. scores among the three grades— see paragraph 6, o. 62-- the calculation 
of mental age scores directly as the product of I. n. times chronological 
age divided by 100 would not be recoinnended. Preferred are standard scores 
' computed at each grade level.) 



ERIC 



Appendix VII 



Florence R. Harris, Margaret K. Jolmston, 
C. Susan Kelley, and itontrose m. i:ol£ 

Effects of Positive Social Reinforceiaent on 
Regressed Crawling, of a iNurseiy Scliool Child ^ 



Journal of Educational Psydxoloiiy , February 1964. 



Effects of Positive Social Reinforcenent on 
Regressed Crawling of A Nursery School Child 



Florence Harris, Margaret Johnston, 
C.Susan Kelley, and Montrose n.Wolf. 



Journal of Educational Psychology , February 1964 . 



SPECIAL NOTES 



The chronology of this study can be conveniently divided into several 
periods. During the first two weeks of her nursery snhool experience. Dee 
showed strong withdrawal behavior and was off her feet zaost of the time. 
During the third and fourth weeks, the teacher reinforced on^-feet behavior 
with the result that, **Dee*s behavior was indistinguishable from that of 
the rest of the children. Next came a crucial 2-day period in v^ich Dee 
was given special attention during her off ^f eet behavior (the reversed 
reinforcement contingencies) . The results of this change in relnforcestent 
pattern are shown in Curves 1 and 2 in Figure 1. Ihereafter, the regular 



two days after the start of this second reversal of procedxires are shown in 
Curves 3 and 4 on page 119. 

Figure 1 is a bit difficult to interpret. The length of the line in 
the horizontal direction indicates the length of time Dee was being system- 
atically observed. Thus, the longest observation period was during the 
second day in which attention procedures were reversed; the shortest for 
the two days immediately following. The steepness of the curve to the 
horizontal axis indicates the degree to v/hich Dee was off her feet. Thus, 
Dee was ori her feet the greatest length of time for the last day shown be- 
cause Curve 4 is not very steep. On the critical first day in which reverse 
procedures were followed (Curve 1) Dee v;as off her feet most of the time 
except toward the end of the observation period when, as shown by the bend 
in the curve to the horizontal position, she v;as on her feet. Do not be 
misled by the fact that Curve 4 is "in the air." Ihe positioning of the 
curves was arbitrary. We suspect that Curve 4 was placed along side Curve 3 
in order to save space, or to remind the reader that the last tv;o curves 
refer to consecutive days under the same reinforcement condition. 




The results for the first 



ERIC 



Effects of Positive Social Reinforcement on 
Regressed Crawling of A Nursery School Child 

Florence R. Karris, Margaret K. Johnston » 
C.Susan Kelley, and ^k>ntrose M. Wolf. 

Journal of Educational Psychology , February 1964 . 



Question 1. 

Puzzling behaviors occurred during the 
reversed reinforcement period where off-feet 
behavior is reinforced. Describe these un- 
expected phenc^aena. 



Question 1. 

Puzzling behaviors occurred during 
the reversed reinforcement period where 
off-feet behavior is reinforced. Des- 
cribe these unexpected phenomena. 



Answer !• Answer 1, 

Dee became more socially adjusted during 
the period when attention was given to off- 
feet behavior. It was not expected that 
Dee's return to her off -feet behavior wuld 
be accompanied by greater social adjustment. 
She began, "...for the first time to accept r 
even seek^ attention from the other teacher." * 
She also exchanged a few \^rds with the other 
children, something entirely new for Dee. 
"The positive effects of reversing reinforce- 
ment contingencies seemed to outweigh by far 
the momentary negative results." (p. 121) 



v;e were also puzzled by another event 
which was not commented upon by the authors. 
We would not have expected Dee to return to 
her predominantly off -feet behavior as quickly 
as she did on the f xrst day that the re- 
verse reinforcement procedures were insti- 
tuted. As ixKlicated by the steepness of 
Curve 1 at its lower left portion, on that 
first day Dee appears to have been off her 
feet from the moment she entered nursery 
school. 

Question 2. 

The authors conclude that the increased 
ratio of on-feet to off-feet behavior in Dee 



Question 2. 

The authors conclude that the in- 
creased ration of on^feet to of f*feet 
behavior in Dee was caused by the teacher* 



* P. 120. 



Harris -2 



Question 2 cont'd. 

was caused by the teacher's positive 
^^cial reinforcement of the on-feet 
behavior. Other explanations are pos- 
sible. Dee's increased on-feet behavior 
might bo explained by at least some of 
the following: a) the reinforcement 
of walking itself; b) increased fam- 
iliarity with the nursery setting; 
c) the expanded range of rewarding 
/Objects (toys and people) made possible 
tpy walking; d) possible physical fac- 
tors (such as illness, fatigue, physi- 
ological maturation) • Decide which 
explanation, i f any^ you think is 
correct. Give reasons for your choice. 



Question 2 cont'd. 

positive social reinforcement of the on- 
feet behavior. Other explanations are 
possible. Dee's increased on-feet behavior 
might be explained by at least some of the 
following*, a) the reinforc^ent of walk- 
ing itself; b) increased familiarity with 
the nursery school setting; c) the ex- 
panded range of rewarding objects (toys 
ajid people) made possible by walking; 
d) possible physical factors (such as 
illness, fatigue, physiological matura*- 
tion) . Decide which explanation, i£ any » 
you think is correct. Give reasons for 
your choice. 

Answer 2: 



Answer 2: 

At least some of the factors mention- 
ed in Question 2 would be reasonable 
explanations for the increased on-feet 
behavior were it not for the fact that the 
investigators could change the on-feet 
to off-feet ratios merely by changing 
the focus of the teacher's reinforcement. 
These four factors were present during 
the off -feet reversal time. We are thus 
led to conclude that the return to high 
off-feet behavior is most likely due to 
one factor that was correspondingly 
changed - the teacher reinforcement pro- 
cedure. If the teacher's social reinforce- 
ment were not a causal factor, removing 
this reinforcement vould not change 
Dee's on-feet to off -feet behavior ratio. 

If you answered that one of the 
four factors could account for Dee's 
increased on-feet behavior, you are in 
a predicament. If any of these factors 
were responsible for the increased on- 
feet behavior then Dee should have con- 
tinued her improvement during the reversal 



ERIC 



Harris 



-3 



Answer 2 cont'd. 

prccedxire because these factors were all 
present at that time. The fact that Dee 
reverted to her off -feet behavior implies 
that any effects of these factors were over- 
shadowed by the teacher's positive social 
reinforcement • 

V>rhether researcher o^ critic, be alort 
in any research for other explanations and 
assess their plausibility. The investiga- 
tor's use of a manipulated variable design 
was effective in dealing with what otherv/ise 
would have been reasonaJble alternative ex*- 
plana t ions. 



Question 3, 

This study is a cause*and-ef feet study: 
attention to on-feet behavior (X) causes a 
child to change her behavior (Y) . It is 
commonly thought that phenomena are explained 
when causes can be correctly identified. 
How can we best explain Dee's behavior? 
Three forms of explanation are as follows: 

A. The covering law form . A single instance 
is explained when it is subsumed under a 
general law which "covers" the particular 
case. For example, the specific instance 
in which a spherical object can pass through 
an iron ring only when the ring is heated 
is explained by the general law that heat 
causes a metal object to expand. 

manipulated variable form . Event ^ 
is said to be a cause of because when the 
experimenter permits X to be present, he gets 
y^, and when he removes X, he fails to get Y. 

^* coherent pattern form . Event is 

said to relate to event Y when the many 
descriptive elements in these events are 
sho%m to '*f it" together to form a pattern ' 
of relations. In such a case, multiple 
causes, some occurring together and some 



Question 3 . 

This study is a cause-and-ef feet 
study; attention to on-feet behavior (X) 
causes a child to change her behavior (Y) • 
It is commonly thought that phencxnena 
are explained when causes can be cor- 
rectly identified. How can we best 
explain Dee's behavior? Three forms 
of explanation are as follows: 

A. The covering law form. A single 
instance is explained when it is 
subsumed under a general law which 
"covers" the particular case. For 
excunple, the specific instance in which 
a spherical object car pass through 

an iron ring only when the ring is 
heated is explained by the general 
law that heat causes a metal object to 
expand . 

B. The, manipulated variable form . 
Event X is said to be a cause of Y 
because when the experimenter permits 
X to be present, he gets Y, and when 
he removes X, he fails to get Y^. 



0 



Harris -4 



ocoirrins as a sequence of events, are des- 
cribed i Thus, to explain the causal relations 
betvreen 3C and Y it is necessary to give a 
full account of the elements involved. 
This is the historian's, or case study , form 
of explanation. 

One. reason we found this study to be 
particularly interesting is that the investi- 
gators provide a rich assortment of evidence 
to 'support the claim that attention to on- 
feet behavior caused Dee to change her be- 
havior. It is possible to (and we would 
like you to) explain Dee's behavior using each 
of the three forms of explanation described 
above. Specifically, in regud to each of 
these three forms of explanation: 

1. cite material from the article itself 
which could be used to explain the change 

in Dee's behavior; and 

2. give reasons for being critical of 
each of these explanations. 

Thus, for example, your answer to Question 
3A 1 would need to identify a general law and 
shoHjww one could claim it "covers" this 
particular case. In 3A 2 your response will 
be a criticism of the explanation presented 
in 3A 1. 

Note: A complete answer to this question will 
have six sections: 3A 1, 3A 2, 3B 1, 3B 2, 
3C 1, 3C 2. Further, v;e are not asking you 
to pick one form of explanation as "correct." 
Critically discuss how each applies in this 
study. 

Answer 3. 



A. Covering law form . 

1) Evidence: 

One way to express the covering general 
law is: behavior is strengthened when it is 
followed by a reward (reinforcement) ; 

ERLC 



^* coherent pattern form . Event X 

is said to relate to event Y when the 
many descriptive elements in thece 
events are shown to "fit" together to 
form a pattern of relations. In such a 
case, multiple causes, some occurring 
together and some occurring as a se- 
c[uence of events, are described. Thus, 
to explain the causal relations be- 
tv/een X and Y it is necessary to give 
a full account of the elements involved. 
This is the historian's, or case study, 
form of explanation. 

One reason we found this study to be 
particularly interesting is that the 
investigators provide a rich assortment 
of evidence to support the claiiq that 
attention to on-feet behavior gmsed 
Dee to change her behavior* It 
possible to (and we would like you to) 
explain Dee*s behavior using each of 
the three forms of explanation described 
above. Specifically, in regard to 
each of these three forms of explana- 
tion: 

1. cite material from the article 
itself which could be used to explain 
the change in Dee*s behavior; and 

2. give reasons for being critical 
of each of these exf: lunations. 

Thus, for example, your answer to ^es- 

tion 3A 1 would need to identify a * ^ 

general law ana show how one could 
claim it "covers" this particular. case. 
In 3A 2 your response will be a critici^ 
of the explanation presented in 3A 1. 

Note: A complete answer to this ques- 
tion will have six sections: 3A 1, 3A 2, 
3B 1, 3B 2, 3C 1/ 3C 2. Further, we are 
not asking you-«tx> pick one fonn of ex- 
planation as "correct." Critically 
discuss how each applies in this study* 



r 



Harris -5 



Answer I cont;*d. 

conversely, behavior Is weakened or eliml«* 
nated when It Is not rewarded. The on-feet 
behavior was strengthened because it was 
followed by a reinforceiaent (adult attention) . 
Aniaal trainers, teachers, and parents have 
used something like behavior nodification 
for centuries by providing food, gold stars, 
or treats %#hen their charges perfonned desired 
behaviors. We generally conclude that the 
cause of behavior change is reward. Dee's 
change in behavior (from off-feet to on-feet) 
is the special case subsumed under the law of 
reinforcement. 



Answer 3: 



2) Criticism: 



The specific instance^ (the Dee case) 
in this study does not fit the general law 
couple tely. Recall in connection with Ques- 
tion 1 that some of the positive behavior 
(e.g« greater social adjustment, playing near 
other children, etc.) was NOT weakened when 
Dee's on*feet behavior was no longer re- 
warded. It is perfectly acceptable to cl&ixn, 
M Uie investigators do, **...that there were 
soorces of social reintorcement not in coor- 
dination with those controlled in the experi- 
B^t.** (p. 121) It is in^rtant, however, 
to be able to identify these reinforcers in- 
dependently of whether or net they* have an 
effect. If only those actions v^ich change 
behavior are called reinforcers, then the law 
that reinforcement changes behavior must be 
true by djdfinition; it is xuitestable. 

Because it is convenient to do so and 
not because it is a direct answer to question 
3A 2, we make the following two observations 
about law and the covering^ law form of ex- 
planation. First/ more than one law can ac- 
count for the same observation. The covering 
law foro thus permits multiple explanations 
of the same observations. Second, %rtien causal 
«iqplanations take this covering law form we 
generally do not ask for further explanation 
since these events are coenon and familiar 
in the escperience of most people. But we may 
find a law*like relation between events X and Y 



ERIC 



0 



Harris -6 



Answer 3 cont'd. 



Answer 3 cont'd^ 



end still feel thet the relation has not 
been adequately explained. For example, we 
Biay see that heat causes a metal object to 
corpand, but we nay still feel that %^ do not 
have a cce^letely satisfactory explanation 
of expansion of netals. Sisdlarly, we may 
feel that "reinforceaent*" is not a satisfactory 
explanation. 

^* PMmlpulat ed variable fora . 

1) Evidence: 

Clearly, the investigators were manipu* 
la ting an event and studying the resulting 
effects. The investigators purposefully in- 
creased teacher attention to Pee*s on*feet 
behavior and gave no attention to of f*feet 
behavior (X present) , then purposefully re- 
versed attention procedures (X withdrawn) , \^ 
and finally reinstated the original attention \ 



procedure (X again present) . Increa8e<21 on* ^ 
feet behavior (Y) was evidenced %ihen attention 
%ras directed toward it ()( present) and when 
such attention was reversed, (X^ withdrawn) , 
the investigators failed to get (Y) . 

2) Criticism: 

The manipulated variable form of explana- 
tion is usually attacked on grounds that 
other factors vary as X is manipulated^ or 
that event X too broadly stated and that 
the real cause of Y is only some component 
of event X 

Scene students have correctly observed 
that reinforcement was not withdrawn but rather 
the particular behavior reinforced was varied. 
Since Dee received teacher attention all 
the time, it is not too siirprising that some of . 
DCS* 8 changes (e.g. greater social adjustment) 
did not deteriorate during the 2-day reverse 
reinforcement period. Nevertheless, the con- 
clusion that the specific focus of the re- 
inforcement caused the change in Dee's on- 
feet to off -feet ratio, is not %#ea}cened by 
the reasoning above and this conclusion would 
seen inescapable if it %#ere not for sosne 
added reservations spelled out balow. 




ERLC 



Harris -7 



Answer 3 cont'd. 

We are struck by the puzzling fact 
ttiat Dee resvsBed her old off*feet habits 
isnadiately upon beginning the first day 
of the 2-day period in which attention 
— procedures were reversed. If these atten* 
tion procedures were such powerful condi-^ 
tioners# then %^y, on the day innediately 
following the final reversal of procedures 
%«as there virtually no lessening of off* 
feet behavior (see Curve 3)7 

We are left wondering how much off* 
feet behavior %iould have* occurred during 
the critical 2*day period had no change 
in procedure been instituted. The data 
X n Curves 1 and 2 would have been siorc 
convincing had similar data been shown 
for the other children. We are told, for 
example, that Dee*s playmates in the doll 
comer were also off their feet. The pro* 
portion of time a child will spend on his 
feet depends to some extent upon the type 
of activity in which he is engaged. V7e 
speculate that during these particular 
tvio days Dee perhaps chose to spend more 
time indoors involved in activities that 
naturally lent themselves to off^feet 
behavior. 

coherent pattern form . 

1) Evidence: 

A case study admits complex events 
taking place over a significant period of 
time with many variables. The investigators 
report many of these descriptive details: 
the family background; Dee*s entry behavior; 
the puzsling fact that she regressed to 
crawling (strong withdrawal behavior to 
usual friendly, warm teacher approaches) ; 
mother* 8 reports; the development of the 
study through the various reinforcement 
procedures^ social adjustm^ent; and post 
checks made at irregular intervals for a 
year sxibsequent to the sttidy. (Teachers 
agreed that Dee*s improved behavior was 
stable.) ^e investigators attempt to show 
how all these facts fit together in a sen* 
Bible way. 



Answer 3 contM. 



\ 



ERLC 



Karris -8 



Answer 3 cont'd. 



Answer 3 cont'd • 



2) Criticism: 

The main criticism is that not enough 
of Dee's life prior to entry into nursery 
school is giveni nor do we know enough 
about Dee's life outside nursery school 
(21 hours of each day) • Specif ically, 
we have no information about why Dee might 
have started to regress to crawling be« 
havior. We could speculate thaf her 
younger brothers (aged 6 months ajid 18 
months) were both still crawling and that 
%rhile they were rewarder* (attention given) 
Dee was not. Finally, we need more des* 
cription of what other adult and child 
behavior might have been reinforcing to 
Dee. 

Vie believe that events are explained 
Mhen a sufficiently rich description of 
these events leaves us without further 
significant questions to ask. The in* 
completeness of this report leads us to 
describe it as a demonstration study 
rather than a case study. It is a demon* 
stration of the application of the prin* 
ciple of reinforcement rather than an ex* 
planation of how and \chy these principles 
work. 



Question 4. 



Question 4: 



One person ^vrote that, "...the study 
would have been better if: a) reliability 
checks had been made on ail the recordings; 
b) recordings had been available for more 
time; and c) th^re had been better docu* 
mentation of what was happening at the 
various tires the child was on*feet and 
off-feet." Do you agree? If not, why not? 
If yes, do you think such improved record 
keeping might have changed the authors* 
conclusions? In what way(s)? 



One person %n:ote that, "...the study 
%^ould have been better if: a) reliability 
checks had been made on all the record* 
ings; b) recordings had been available 
for more time; and c) there had been 
better documentation of %rhat %ias hap- 
pening at the various times the child 
was on-feet and off -feet." Do you agree? 
If not, %ihy not? If yes, do you thiiik 
such iii^roved record Keeping might 
have changed the authors* conclusions? 
In what %#ay(s)? 



ERIC 



1 



Harris ^9 



Ansver 4. Answer 4: 

Me agree. The procedures used in 
this study %#ere amch move casual than those 
%fe %icmld generally expect to find in educa* 
tional and psychological research. Dee was 
oL»6i:v««2 systematically only a portion of 
the tisie. The change in percent of time 
off-feet froTO the 2-day reverse reinforce- 
ment period to the second reversal period 
might be accounted for by normal, expected 
variation, without a longer, more detailed 
accounting of percent off-feet statistics, 
we have little basis for assessing the 
fluctuation which did occur. 

We have no indication of how reliably 
the observer 9 were able to record Dee* a 
behavior. Particularly welcome would be 
information on inter-rater reliability; that 
is the extent to which the ri^tings of several 
judges observing the same occurrences agree. 
Further r the investigators rely on teachers' 
judgments and impressions which may be sub- 
ject to bias due to tlieir own expectancies* 

If we had more precise and complete 
data, the speculations we made in our answer 
to Question 3& 2 would not have been nec* 
essary. Although %re doubt that such data 
would change the main conclusion, there 
nevertheless remains the possibility that 
sisch alternative explanations (chance fluc- 
tuation, nature of the activities Dee engaged 
in, and so forth) would be supported. 



Question 5. 

Cite four strengths (i.e. desirable 
features) of this investigation. Attempt 
to identify distinct types of strengths. 

Answer 5. 

There are many positive things to be 
said about this investigation. A few 
strengths are listed below, but this list 
should not be considered coinplete. 



Question S. 

Cite four strengths (i.e. desir^ 
able features) of this investigation. 
Attempt fo identify distinct types of 
strengths • 



Answer S: 



Harris -10 



Answer 5 cont'd. 5 cont'd. 

♦ 

1. An cxperiinental situation was nanipu** 
laf:ed by tba investigators. When the 
iiju«i^iij^«t variable is manipulated by 
th^ experimenter « wc have a very strong 
technique for investigating the existence 
of causal relations. The fact that the 
investigators provided, then resDOved, and 
then restored attention to the on<-f eet 
behavior of Deo provides strong evidence 
that this attention ua3 a cause of whatever 
effects varied systematically with changes 
in attention. The evidence is much stronger 
than, for exas^le, if the investigators 
snerely increased attemtion to the on«*feet 
behavior and observed the results. 



2. The investigators displayed a concern 
for the possible negative consequences 
on Dee of their study. Ihe investigators 
were prepared to terminate the reversal 
condition if Dee showed, **...any evidence of 
detrimental effects^ such as loss of speech, 
crying, or other emotional behavior." 
Researchers do not have tmlimited rights 

to manipulate their subjects. The rationale 
that society will benefit from such findings 
is not sufficient to harm, psychologically 
or physically, the particular subjects 
used in search for greater knowledge. 
The researcher has an obligation to protect 
the individual. 

3. We corrmend the .nvestigators for seizing 
an interesting opportunity (the discovery 
of Dee) for special study. Significant 
research is often conducted vrhen the research 
is triggered by a pur.zling observation or 
fortuitous event. Had the investigators 
first plaiineo r careiil system of observation, 
for example, and then sought to find a Dee, 

a more smoothly executed study might have 
been the result, but only if a Dee could then 
be found. It i& better to do %rhat you can 
with an interesting situation i^kat presents 
itself than to let it pass unstudied. 



Harris --ll 



Answer 5 cont*d. Answer 5 oontM. 



4. The study has direct relevance for 
oducational practices. The najor variable 
is one that can be Manipulated by tei^chers 
in classrooms or other situatiojis. Thi^t 
is, it is in the power of teachers to reltirL 
force desired x>ehaviors by such social re* 
%fards, although we admit that, in some 
senses, the aiti^ation described in this in- 
vestigation wao well suited for this purpose. 
The fact the study waa conducted in a 
schooling situation using techniques easily 
learned facilitates its adoption by others. 

5. The study did provide evidence about 
the effect of positive reinforcement. 
Thus, in the conduct of the study, sup- 
portable }'.nowledge claims were made. 

6. One person ccmnented that a strength 
of the study was: 

To make nursery school teachers 
aM student observers n»ore sensi- 
tive about the effectiveness of 
their own behavior in shaping chil- 
dren's responses. One child 
changed; many teachers were. One 
can iiaagine the hours of meeting 
tine that were devoted to planning 
and discussing this study and its 
implication; no doubt this was 
excellent in-service education of the 
teachers and their apprentices. 

We note the values of research are not 
limited to the supportable knowledge claims. 
Inquiry is a form of learning and is often 
as valuable as a process itself as for the 
direct results it supports. 

7. The investigators considered several 
dependent variables (e.g. social adjustment) 
and not just the single variable of on- 
feet behavior. They %rere concerned both 
with the long range results of the experi- 
ment (as evidenced by their follow*nap 
cAiecks) and with the unintended ss' well as 
the anticipated outcomes. 

ERIC 



8. Om child %#ae benefited directly. 



9« The problen ms %^11 stated, the ar- 
ticle was loqically organized and the 
nature of the reinforcement was expli- 
citly defined. 



Appendix VIII 
David Elkind, Joann Debliixjer, aiid i;avid Adler 
;totivation and Creativity: The Context Effect 
American Educational Researcn Journal, Vol, 7, No, 3, * 



Hctivation and Creativity: The Context Effect 
David £1kind, Joann Oeb1inger» and David Adier 

American Educational Research Journal » VoK 7^ No. 3, hd3fl970 



SPECIAL WTES 

p. 352, p- 35C: Putat ive creativity njeasures means genera lly considered to b& 
creativity rreasures. 

p. 354: Replace the first sentence under the subheading "Design" with the follow- 
ing: 

The experiment lent itself to an analysis of variance design with 
motivating-condition as the within subjects variable and otder^of^ 
motivating condition and students within order*of -motivating condition 
as the betv-iecn subjects variables. 

The above sentence » which replaces the inaccurate statement In the published 
report, indicates that the statistical analysis of variance Involved three variabli 
l).aotivating condition (interesting task interrupted, uninteresting task Intcrrup 
ted), 2) order-of-motivating-condition (interesting task interrupted firstt unln- 
interesting l-ask Interrupted first), and 3) students within order-of*mot1vating 
condition (16 students t+K) were interrupted first frm an interesting task and 
16 students who were interrupted first from an uninteresting task), flotlvatlng 
condition i|r considered as a "within subjects variable" because the two conditions 
being con^^ared (interesting^ task interrupted, uninteresting task interrupted) 
involve the same 32 students. Variables 2) and 3) are "between subjects variables 
because the comparison of the ti^o orders and the comparison among the students i 
each involves differf:nt subjects. I 

p. 355: Omit the ^.ast t./o lint:S above the hcsdirg, DISCUSSIOM. By "Groups Under 
Order of fJotivating Condition" the investigators must mean students within order* 
of-motivating condition » and a sicnificsnco test of this variable is not possible 
given the design used in t;v3 study. 

p. 356: The creativity-intcllirsnce dichotomy is the separation of creativity am* 
intelligence into distinct- traits so tfiat t?><ng highly intelligent deer, 
not n^ces'^^ily raui being creative ltA viC(^ ver?a. 



ORIENTING QUESTIfN OF APPPAISAL 

Any study will probably contain J^ey weaknesses and strengths as well as 
several more minor ones. By key, wo ni2an""those aspects of the study upon which tt 
Value of the work vasts rost heavily, and without which the study viould be reducec 
|n*rkedly in v:orth. In an empirical s*:udy such as this one, key areas Include: 

A) quality of reasoning from problem statement data to conclusions and 1itq)11catior 

B) methods of work (includ'ing in5tru?r.5nta':ion, design and analysis); and C*) defens 
of the problem's significance. Provide a critique of the key aspects of the studj 
ivhich emphasized Its key flaws and is organized into the three areas just Identifl 
Do not concern yourself at this stage with key strengths of the study. 



ERIC 



-2- 



CRITIQUE 

A* Rea soning from Problem to D^tai to Conclusions : 
Bdckground Considerations , ' 

1. "Hotiyation and Creativity: The Context Effect,'' is -simlcr to the vdst — 
majoHty of empirical investigations in education in tliat the purpose of the 
research is to discover and to explain relationships between variables* How 
these variables are defined and described is therefore crucial to the value of 
any such investigation. 

2. The variables of this study are described at varying levels of abstrac- 
tion. At a level close to the events, the variables are referred to as the kind 
of task interrupted (i,e., crossing out n's and 6's, or activities indicated by th 
teacher as interesting) and as the total number of responses and total number of 
unique responses tc several specific questions. At the highest level of abstrac- 
tion, context and creativity are the variables being related, and motivation is 
seen as the construct (or abstract ''mechanism") which explains the relationship. 

3. The import of a scientific study increases greatly \*en an investigation 
is concerned with variables at higher levels of abstraction. There are two relate 
reasons for this point. First, predictions whicti cover a wider range of obser- 
vables are possible. ThuS; for example, if a relationship is described in terms 
of creativity, then we can predict the relation to hold for other valid measures 
of creativity. On the other hand, if a relationship Is examined i;i terms of 
nu76er of uses of a newspaper, then we have a poorer basis for predicting perfor- 
mance with other kinds of measures. The more specific the terms of the examinati< 
the more specific and therefore limited will be the valid applications. Secondly 
the use of constructs helps us to explain the reasons for the relationship. If 
we want to understand the reasons for a relationship, if we want to know the 
extent of this relationship, and if we want to know how to allow for this relatioi 
ship in the practice of education, it is important to set the observed relation- 
ship into an explanatory system or theory. Some of these ideas are Illustrated 
by Figure 1. 



4. Constructs: 




Observed 


Treatment 




Test 


Variables: 


Conditions 




Performance 



Figure 1. Variables considered in the present study and their professed 
interrelationship. 



Not expected to have been stated in a model critique but offered here 
Q for pedagogical purposes. 

ERIC 



5. As Just indicated above, the Importance of o study is greatly cnhanc#»d 
when It rises beyond providing the relation between observed variables and yields 
an Inferred relation arrong constructs. Out the validity of this inferred relation 
depends So turn upon the validity of the observed variables used to measure the 
construct*^ and uiX)n the adequacy of the intervening constructs. Consequently, 
foMowInu l*i our assessment of the context , creati vi ty and m otivation constructs 
A* Ihty arc ^^hvo^tved the study. 

Context : 

6. In the broadest sen^c used in this study* context Is seen as, **the 
ongoing act! vi tii^-t^ Interrupted by the test procedures. *• Other statements In 
the study lead one to beheve that a defining property of the independent 
variable :r the {knowledge of the child that he wi 11 be returned to the task 
designated as *'{nterest»n5'* or "uninteresting". Bui It lsn*t clear, when we 
interpret the results, whether differences in scores are to be attributed to 
socne perceived contrast between past activity and prcfsent test-taktng tasks, 
or to anticipation of soft« future e^q^erience. This difficulty Is Illustrated 
by figure 2. ^ 



7^ 



Condition 



Before 
Tcs t i ng 



Interesting Interesting T 
Activity E 
S 

Unintereiit ing Cress out n's T 
and 6*5 



After Testing 

Resume pretesting activity^ 
i.e. 9 return to Interest- 
ing task 

Resume pretesting actlvlty- 
i.er, return to crossing 
out and 



Fl9. 2 Study Design 



-4- 



8. The difference in test performance under the ti/o conditions 
might be due to the nature of the interrupted before testing task (as the investi- 
gators suggest) or to the differential pull of the anticipated post-testing 
activities. For exanple, students in the uninteresting condition might have per- 
fonned better on the tests nut because they were happy to oet out of an unple- 
sant task but because they persisteeT&s the test to delay their return to the ' 
unpleasant task. The study^ design doel not permit one to assess which context 
(the pre-testing or the post- testing) Is the irore important. 



Hotivation : 

9. The invostigntors ?ppear to have been too quick to infer that moti- 
vation' is the appropriate construct to be used to explain the differential 
test perfornance. Other concepts which could explain the resulte include: 
need for novelty, desire to return to a pleasant state, drive fw optimal 
stimulation, etc. Still other concepts are suggested by the vast literature 
dealing with work conditions and production. He did not include the 7,iegamik 
effect (the ability to recall unfinished tasks more than completed ones) because 
there is no reason to believe thdt the "interesting" and "uninteresting" tisks 
would have a different ial "pull" since both were interrupted tasks. 

10. It is possible to argue that all the addttfonal concepts being suggested 
are really v/hat is nteant by motivation-^that is, that irotivation is any drive, 
'^N desire or need. Such a concept of TOtivation is so broad and pervasive that 
It can be used to "explain" just about everything and, consequently, explains 
nothing. There is no viay to distinguish motivating from non-TOtiveting contexts 
that Is independent of test pcrfonrance. 



Creativity : 

11. There is no attempt made to shovi that the tests employed are adequate 
tests of creativity. In fact, this task is disavowed: 

It is not our intention her^ to deal with the issue 
of whether these tests measure 'creativity' or 
something else. In this 'regard v.^e tend tc side with 
Cronbach who argues that "creativity* is too value laden 
, and that nances for particular tests should be used to 

/ designate the measure in question. 

This n.ove on the part of the investigators succeeds in insulating the argument 
from the objection that creativity is not really measured by the^ tests, but 
the price of this success is the triviality of the conclusion. He cai^^aalx 
conclude that there is a relation between context and some test scores rather 
than betomen context and creativity. 

However^ in spite of the quotation cited above, the investigators 
believe themselves to be dealing with creativity. In numerous places In the^*^ 
text, as v/ell as in the title and the abstract, the investigators refer to the 

erJc 




<fependent variable as creativity measures. Further, the most plausible reason, 
for lumping fOQether the scores of the three tests when drav/ing conclusions is 
that the three tests are measures of the sane thing^ presumably creativity. 

13. The investigators are trying to have it both ways. They want to eat 
their cake (protect themselves from the objection that creativity Is not really 
ineasured by the tests) and have it too (use "creativity measures" In t^e discuss^pn 
and conclusions). 



B. Method s of lJork 

14. At this point in the review, wc shall consider as the variables of 
interest the kind of task interrupted and the score on selected tests. Even 
at this loi7 level of abstraction, three aspects o'^ the research procedure hinder 
our interpretation of the relationship found: natr^ly, the atypical character of 
the Interrupted tasks, especially the uninteresting one; the inadequate descrip- 
tion of the testing situation; and the use of a special school (l/OIS) as the 
locus for the research. 



Atypical Character of the Interrupted Tasks : 

15. An 7fr»ned?ate, practical result of the discovery of a context effect would 
to alert the educational practitioner and researcher to the need to be con- 
cerned vnth the t^ctivity a child Is engaged in prior to any testing situation. ^ 
By using tasks seen as unrepresentative of the school situation (such as crossing 
out letters and numbers) the Investigators reduce the likelihood that their 
findings will have relevance to other situations. . 

16. In defense of the investigators, it is sometii)»s wise to attempt to 
obtain ^he relationship desired using extreme conditions which, in this caset 
would b^ very boring and very interer.ting tasks. If the effects are not evident 
given extreme conditions, the investigator can feel quite safe in concluding 
the independent variable Is not an important determiner of test performance 

In a more usual situation^ If the effects are evident under extreme conditions 
then future research can be directed toward the assessment of the effect in 
a variety of more realistic situations. 

Inadequate Desc ription of the Testing Situation : 

17. There is a gross lack of infonnation given about the conduct of the 
creativity testing. The reader needs assurance that the testing conditions were 
identical under the two motivating conditions. Were the tests administered in 

a group situation (v/ith the tv/o motivating conditions separate) so that if 
SOTie child were brave enough to get up and leave others might follow? IJhat 
subtle cues about how long the children could work on the task may have been 
present? ^^at were the children told about how long they could v/ork on the tests^ 
and were these instructions consistent with instructions usually given for 
such tests? Did the children think that if they "finished" early they could 
(or hed to) retura early to* the task which had been interrupted? In short, 
there are far too pjany unanswered questions about the administration of the 



ERIC 



tests. The entire difference in test scores under the two conditions might be 
explained by the factors just identified. 



Use of a Special School : 

18. The i(nportance of the nature of the Interrupted task might not have 
been so marked had the study been conducted in a more typic^sl school setting. 
The child in a special school may view the interesting and uninteresting activ- 
ities as quite different in kind, and thus capable of producing warked differences 

Jn test performance. In a regular school, the variety of tasks which could be 
interrupted by testing v/ould likely differ by degree rather than by kind • in 
fact the testing situation night be seen as an enjoyable diversion regardless 
of the task Interrupted, lie do not claim that such speculation on our part 
Is correct; we only remind you that studies conducted in a very special situa- 
tion may not generalize beyond it. 



Hote About Analysis : 

19^ We do take some exception to the, way the data were analyzed artd reported* 
\io doubt, however^ that our interpretation of r^isults would be much changed 
had a nor^ adequate analysis been performed. A great many readers cited the 
use of only 32 pupils as a weakness of the study. We were not bothered by this 
sample size for two reasons. First, since each pupil was tested under two condi- 
tions, the effective sample size was greater than 32. Second, large samples 
are desired to help instire that real differences in conditions %dlf be detected 
and not attributed to change. However, a larger sampld was not needed because 
the results of the present study v;ere statistically significant even with the 
"small" sdPiplp tised. (The specific questions and ansv/ers section cf these 
appratsaT materials deals, in part, with data analysis and interpretation.) 



C. Significance of the Problem * 

20. Ore student, in appraising the research problem, argued: 

' This study has? very little educational signi* 
ficance. The primary reason is that they nought 
to demonstrate sometiiing which is already well 
accepted by psychologists and educators. It 
should surprise no one to find that level of 
motivation hes an effect on the performance on 
a test v)hich is at least partially scored on the 
basis of number of responses emitted. 

The .investigators themselves admit that others have shown the effect of 
motivation on test performance. The present study, however, deals with a 
motivational context of special interest to educators • namely, what the 
child was doing before taking the tests. It seems to us very important to 
know whether scores on tests such a:« those given in this study can be influ-^ 
enced to sucK a large extent by something as seemingly innocuous as the nature 
of the activity preceding the test administration. 



?K Thus, %*e v1ev» the research problem to be significant. Because of the 
many concerns discussed above, hov,^ver, we feel that the chief value of the 
study as conducted is only to remind and caution us that some attention to 
context is required \then testing for cr<»dt1vity. The findings, nevertheless » 
are provocative nnough to warrant additional research on the question. 



/ 



SPECIFIC OyESTIOMS* 



8 



I. 



I. 



Tha .study was suggested by some us>expected findings 
that we encountered In our evaluation of the Innovative 
educational program../*. Is It legitimate to develop 
research from unexpected findings? Explain why you 
ant%«ared as you did. 



Yes» It Is leg! tf mate to dev3top resear^^ fro^ an 
unexpected finding. A puzzling observation » an 
anomly^ or an unusual situation h^s often been the 
precursor of significant research, l/hat is unexpected 
Is, of cCfursc, a function of what is expected. In this 
case the educational expectatton of increased creativity 
for children in a school sotting stimulating inquiry and 
free choice was not upheld. The educators expected the^e 
children to be mor-s* not less, creativ*^ th?n children In 
traditional schools. 

Every situation has Its idiosyncratic aspects* If one 
looks at enough features in a given situation, one or 
nore of the observables is apt to look unusual by chance 
alone. Thus, aUhoush a puzzlement in one situation 
ftilght viell stimulate further stu^iy in an effort to seek 
replication or explanantion of tl-ie phenomena, the first 
occurrence of a puzzlement should not be taken too 
nortously In and of itscH. All perplexities are not 
worthy of serious further tnvestl^iation. 



Student Response 

Unexpected findings, "suggest that some important vari* 
ables had net been considered or that there is some flaw 
In the e)ipcrlmnt9) design.^' 



Our Reply 



A good point. Before rushing out to seek replication 
of an unexpected. finding, the researcher should re* 
examine all the (H^ocedures of the study which resulted 
In such phenomena in search of ''flaws" which may alone 
account for the puzzling observation. * 



1. The .study was 
suggested by some 
unexpected findings ^ 
that i^e encountered 
in o^r evaluation of 
the Innovative edu* 
catlonal program../** 
is ft legitimate to 
develop research frois 
unexpected findings? 
Explain why you 
answered as you did. 



*nOTC: The order of these questions is the same as the paragraph sequence 
of the published paper. ^ 



9 



The unexpected findings which mctivatcd the present study 2« 
resulted from a comperiscn of ^' ^rid of Inquiry School 
children with children sclecte^, "from nafves on the 
waiting list for acceptance into ^.lOIS'*. In this earlier 
evaluation of the UOIS and its effect on creativity do 
you approve of matching the exparlmental qrovp children 
with children fron the waiting Ust, or do you think the 
Investigators rhould have selected the control children 
rdcdotitty from the public schools regardless of whetlier 
they were oo the waiting list? Mbyl 



2* When corrparisons are to be mode, validity cen be maximized 
if the coAtparlson grbups are as identical as possible 
except for the variabJes to be examined. If the two 
Qroups of school children wera different before the one 
group attended the UOIS, then It is difficult to separate 
these Initial differences from the effects the l/QIS was 
responsible for. Of the two choices given In Question 2, 
taking control chtlflren from the waiting lists appears to 
be more valid, l/e can infer that juch children arc more 
likely to come from a home environment more similar to 
the actual VfOIS children than children whcse parents chose 
to send them to public schools. 

Of course* the answer to Question 2 would be different If 
there were or>1y a few children on the waiting list (and 
thus obtaining a good match with the \KHS children would 
be Impossible) or if there is come syst^Mttatic bias In the 
way in which the children v/ho were made to wait were 
different from those wlio were accepted immediately. 
Thus, before giving a firn answer to Question 2, it would 
be helpful to know why sone children were accepted and some 
children were made to wa»t* In the absence of a differentia) 
selection policy^ vic support the investigators* tactic of 
choosing ccitrols from tho waiting list group. 



The unexpected find- 
ings which motivated 
the present study 
resulted from a com- 
parison of \/orld of 
Inquiry School 
children with child- 
ren selected* *'from 
names on the waiting 
list for acceptance 
into WIS*'. In this 
earlier evaluation of 
the \m% and its 
effect on creativity 
do you approve of 
matching the expert-* 
oental oroup chtldreii 
with children from 
the watting list* or 
do you think the 
Investigators should 
have We tec ted the' 
ctxfitraJchI Idren 
randoml9^from the 
public schools 
regardless of 
whether they were 
on the waiting list? 
Vihy? 



i^tudent Hespons e 

The investigators should nave chosen the control group 
randomly fror» the general population to, '*glve more 
credence to the gcnf-rat izabi 1 ity of the study." Further, 
'*those on the waiting list are a specl^^l population, 
perhaps more 'creative' (or motivated) than the average 
pt^lic school student.*' 



Ot JIT Reply 

Our question refers to an earlier study which had produced 
the unexpected findings regarding ••creativity". The purpose 
of that earlier study was to evaluate the './OlS. It was 
therefc^re necessary to use children who were as Identical 
to the *.X)IS children as possible so that the differences 
^■-'tween the two groups of <rudents could be attributed to 

ERIC 



10 

Oyr fteply (continued) 

the UOiS experiences rather than other factors* 
To be sure* the students In the \fOIS may be atypical 
and the ratings of effectiveness of the UD1S may 
not generalize to a more typical student population. 
But for the rather specific purpose of determining 
If the WOlS itself had any impact at all. It is 
necessary that the cocnparison group be as atypical 
(in the same ways) as the students fn the school. 
A good available source fore control group was the 
tiOtS waiting list, the source actually used. 

Student Response 

I **lt seems to me that the choice of control group 

( depends on the question the researchers wanted to 

be able to answer. If the question wast Are children 

In the VK)IS more 'creative* than children in public 

schools?, then the control group should be chosen 

randomly. However, such a question would not tell 

us anything about the effect of the V/OIS curriculum 

on encour«^glng 'creativity* In its students. 

Children on the waiting list, however, have already 

been ^idmitted tc the school, and therefore ought to 

be more Similar', by whatever criteria the UOIS 

uses for admission, to the VfOIS school population 

than a random sample of public school pupils. 

Thus, If V/OIS pupils performed better on the 

battery than potential V/OIS pupils, one would 

be fn a position to fnfer that:, for children 

likely to met" t WOiS standards, the UOIS curriculum 

does promote 'creativity* to a greater degree than 

<}oes the public school curriculum.'* ^ 

pur i^eply 
We agree. 



The Investigators state (p. 353» paragraph I): 
"Inasmuch as each child who participated In the 
study served as his o%jn control, we made no attenpt 
to control for or to equate individual differences 
In ability.'* (a) What does it mean for a child 
to serve as his own control? (b) './ere the 
researchers Justified In not equating individual 
differences? 



The Investigators 
state (p* 353, 
peragreph t): 
"Inasmuch as each 
child who participate' 
In .the study served a 
his own control* we 
made no atteoyit to 
control for or to 
equate Individual 
differences In 
ability." (a) What 



ERIC 



it 



)• M To serve as their <^n control ineans that obser* 
vat Ions to be coripared involve the sam objects 
(usually people). In ihi!» study* the students 
served as their own controls since the test scores 
to be compared were produced by the same children • 
once after they were interrupted from an Interesting 
task and once ofter they were Interrupted from w 
uninteresting task. 

Student Response 

When a frhlld serves as his own control » you have *^a 
repeated measures design/' 

Our Reply 

Thai U correct. "Repeated measures'' occurs most 
frequaritSy !r. the statistical literature concerned 
with tho anolysis of experimental data obtained when 
subjects serve a^ their own control* 

Student Response 

To serve as his own control means "the behavior and 
responses of each child were reflected from his own 
personal experience'*, or that ''children were stratified 
by age/* or that "children were matched." 

pur Reply 

These answers are In correct. 

(b) Yes, the investigators were Justified In not 
equating Indlvidiial differences In ability. They 
did not wish to match students and restrict the 
population of children any more than was already 
the case by virtue of the fact that only IVOIS 
students were Involved In the study* Further, 
It was not necessary to pick carefully the 
children because of primary interest In the 
study was the comparison between each puplPs 
score under niotivattng condition I and his own 
score under motivating condition 2. 



does It mean for a 
child to serve as 
hll own control? (b) 
Were the researchers 
Justified In not 
equating Individual 
differences? 



h» Should different (but matched) children have been used 
In the two motivating conditions rather than exposing 
the same children to both conditions? l/hy? 



4. Should different (bu 
matched) children 
have been used tn th 
two motivating con- 
ditions rather than 
exposing the same 
^ children to both 
conditions? Mttfl 



FRir 



Tbit If m extrmely difficult question to ansiMf* 
Any fMurch design It a cofupromlse. Using children 
M their ONH control In e repeated masuret design^ 
as In this study, has both cdvantages and disadvantages 
over the design In which inatched groups of children 
are ei^ployed. It Is a trade*off* 

In the case of performance measures , Individual dtf* 
ferences account for nost of the variability and 
treatMnts (such os the two motivating conditions) 
often make reiatlvcly little difference. Since this 
Is the case» it is iftv>ortant that differences among 
the Individuals in the two treatment groups be as 
s«all as possible so that the relatively small treat* 
mmnt (context) effect wl 1 1 not be maslced. The great 
advantage of the design actually used (the subjects 
as their own control design) is that the differences 
between the individuals in the two treatment groups 
has been minimized. Indeed, they are the same people. 
A proper evaluation of the motivation conditions 
effects would involve o comparison of the magnitude 
of the difference in test performance under the two 
conditions, with the magnitude of the ^^unaccounted** 
for differences, i/hen the subjects are their own 
control, the unaccounted for differences are re- 
duced tremendously, making us more confident of the 
accuracy of the treatment differences observed. 

The^^dr^nAback to using subjects as their own control 
Is that such a design does not protect against what 
Is called a differential "carry over'* effect. To 
Illustrate ihls effect, assume that Instead of two 
aiotlvatlng/condl tions, two drugs, A and B, %^re 
used, further ass^jme that drug A affects perfor-^ 
eiance while drug 0 dees not, and the effect of 
drug A Is carried over to the time that drug B is 
tested. Any measure of the performance of the 
group that received drug B second would not be 
a true Indication of the effect of drug 6 alone, 
since drug A would still be in effect, and any 
conclusions would thus be in error. Since 
differential carry-over effects are unlikely In 
the study as actually conducted, we would support 
the repeated measures desic^n actually employed 
by the Investigators. 

Student Responses 

*'Uslng different children would have produced a 
tighter control on any testing or practice effect.** 

'There was a definite possibility of contamination 
In the design. Being exposed to the first form of 
the test might well have influenced the Mture of 
the responses to the second form.*' 

^"M^lng to take two equivalent forms of the same 
cDi/^it might Involve a carry-over so that the child 
ki i j^ i h ild remember and become more proficient the 

second time the test Is taken.'* 



"Somt type of leornlng took place dtjrlng the 
first Cestiny 4ind perhaps some modi fleet ions 
occur between the first and second testing. 
pur lUply 

Thf practice or testing effect Is control led since the 
order in which the two treatment conditions are given 
Is counterbalanced half the children are first 
re«x>ved from an Interesting task, half the children 
•re first removed from the uninteresting task. Tbus^ 
the above student responses are not completely 
accurate; they need to specify that any **contami nation*' 
or ''carryover*' effect would be differential In nature 
as explained In the second paragraph of our Initial 
answer to this question. V?hy would this practice 
affect or learning be greater (or less) going from 
eotlvating condition 1 to motivating condition 2 
than going from condftlon 2 to 17 

Student Responses 

Use of the same children is preferable to employing 
auitched groups because: (f) ''there was no pretest 
to help make a good match," (il) "selection bias 
would take place/' (iii) "it is hard to equate 
groups/* and (iv) there are "too many variables to 
match children." 



Our Reply 

For matching to be ip^ximally effective, two conditions 
need to be met* First, using matching variables highly 
related to the criterion measures (creativity scores in 
this study). If, as implied in student response (i), 
such matching variables are not available* then the 
effectiveness of the matching strategy would be reduced. 
Second, random assignment to the two treatment conditions 
be made after matching has taken place* This procedure 
protects against the selection bias referred to In 
student response (li). As long as the above two cont- 
entions are met, matching can be highly effective and 
free from bias. Contrary to that Implied in responses 
(lil) and (Iv), the groups need not be equated on 
numerous variables. 



5* The investigetors state (p. 3S3» paragraph I) that having 
children from reverat grades and of different ages allows 
for greater generality of conclusions than If a more 
homogeneous group were used. Do you agree? Explain. 



5. The Investigators 
state (p« 353i 
paragraph 1) that 
having children fro 
several grades and 
different ages alio 
for greater generaf 
of conclusions thai 
If a more homogenec 
group were used. I 
yoiT agree? Explali 



5* Yes* an Investigator can make broader conclusions nhen 
. he has einployed a variety of subject types or has done 
hie research fn a variety of research settings. This 
statement presupposes that the investigator has analyzed 
his data by these subgroups. In this study, the Investi- 
gators have performed such analysis for sewral aoe groups. 
We are MOT saying that the same conctusloijsr^Hi It necessarily 
be valid for each of the various groups and research 
settings » but only chat given proper analysis » one will 
be able to make a inore general set of conclusions using 
a heterogeneous mixture of subjects than if a homogeneous 
subject pool is used. 

Student Responses 

*'One must heve odequate nur!A>ers to general I za with con* 
fidcnce-'* 

**Stnce the saipple was so smalU it may be difficult to 
generalize for several grades and different ages**' 

^miS kids are not a norinal group/' 

Our Reply 

This point is wolt taken. Because of the few chlld^ 
ren In each suocaCegory, the likelihood is snuill that 
the investigator wilt be able to make statements about 
the differences by such subgroups with confidence. 
Further, the special school getting limits general!* 
ZdblHty to such schools. Thus^ while we definitely 
agree with the investigators* statement as provided 
In question 5» ot the same time we recognize that 
the general I Zdbi 11 ty actually achieved In the study 
Is limited. 



6* The researchers should have used more than one Puerto 
Rican student because it Is too likely that an 
unusual student was in some way chosen. Comment. 



6, If the investigators wished to generalize their 

results to Puerto Rican students, then clearly more 
Puerto Rtcan students are needed for this purpose. 
The one student may not be typical* if the researchers 
wish to generalize to the WOlS population (as they 
clearly state they do}« then one Puerto Rlcan, the 
Investigators assure us» makes about the right pro^ 
portion, in fact, to use many more than one such 
student would make the sample unrepresentative of 
the UOtS population and hinder atteinpts to make 
accurate generalizations about the school population. 



The r%t0ikfch%r5 
should have used 
more than one 
Puerto Rican student 
because It Is too 
tikety that an 
unusual student 
was In some way 
chosen « Commmnt« 



15 



7« I>o you think It was Important that the Investigators shoH 
the two forms of the creativity tests to be equivalent? 
Uhyf 

7« Although a sensible thing to do, having strict equivalence 
betiireen the two forms of the test& was not essentia). 
Recall that the order in which the test forms were 
administered was counterbalanced: that Is, half the 
time one form vas given first and half the tlme\the 
other form was given first. Although not explicitly 
stated, we believe It reasonable to essume that half 
the time one form was given after the uninteresting 
task was Interrupted and half the time the other form 
was given after the uninteresting task. Thus Jwe can 
expect any differences in the forms (such as degree 
of difficulty) to balance out since neither of the v 
mntlvating conditions or order*of*motIvatIng con* 
dittons Is associated with one fonn of the test more 
than the other form. In this study » equating the 
forms of the tests is a reasonable, but not essential » 
procedure to follow* 



Do you think It was 
taportent that the 
Investigators show 
the two forms of the 
creativity tests to 
be equivalent? Uhy? 



Student Respons e 

It was Important to shoiv that the two forms of the tests 
%iere equivalent because ''If the difficulty of the tests 
were different^ no conclusions could be reached.'* Further, 
**the differences found in the results could be due to the 
tests rather than to the treatment.'' 

Our Reply 

lie disagree, for reasons given in our initial answer to 
this question. Had the investigators not used a counter** 
balanced design, then we too would have wanted the tests 
equivalent* 



S tudent Responses 

'Mf the tests had not been equivatent. It would not have 
beeti possible to occurately measure score changes." 



••It would be very difficult, if not impossible, to get 
accurate, valid measurement of a child's difference in 
scores if the difficulty of the tests were different/' 

Our Re^Y 

It Is certainly true that it is difficult to interpret 
an Individual child's difference tn the scores of two 
tests If the tests dcn*t have equivalent units. But 
tn this Investigation Individual's difference score 
was not even computed. Each mean that was computed 
oV^ee Table I in the research report) involved an 
ERXCuwl number of scores on the two forms of the test* 



i6 



8. '*Th6 Mnteresttng' condition was determined by the 
child's own Interests as Indicated by the teacher.*' 
Do you approve of this procedure? Explain* 

8. We approve of this procedure and other approaches too. 
Of course » the Investisotors and rtaders want some 
assurance that the tasic en9a9ed in was interesting 
to the child. One way to do that Is to get such 
assurance from the child himself. Another reason* 
able way, It seents to us, is to trust the teachers* 
Jud9ments, the procedure acfudlly followed. Doth of 
these approaches, asking the child himself and asking 
the teacher, may be subject to a bias produced when 
activities are reported or judged to be more interesting 
than they really are. (The effect of such a b^as is 
to reduce the difference between "Interesting'* and 
uninteresting activities «ind, consequently, to make 
More difflc'jlt finding significant differences 
between the two treatments.) 

Another possible procedure would have been to parallel 
whet was dofie for the uninteresting task and put all 
the students In a situation the investigators believe 
to be interesting to the vast majority of the children. 
The problem with this procedure Is that It is difficult 
to devise a task one can be sure will be of high 
Interest to a substc^ntial number of children. The 
edvantege of this procedure is that it makes It 
possible to specify exactly what the interesting 
task Is and to control when the child will bet ready 
to begin testing. 

As Said before, research design Involves compromises 
and trade*offs. Using teachers' judgments seems to 
be a reasonable choice, although we would defend 
as %iell the other two procedures we mentioned. 

Student Responses 

"I'm not sure that the teacher can accurately 

determine those conditions Which are interesting 

to a child*** 1 K 

*Teachers may sometimes be deceived as to a parti* 
culer child's interest/' 



8. •'The * Interesting* 
condition was deter* 
mined by the chl Id's 
own Interests as 
indicated by the 
teacher." Oo you 
approve of this 
procedure? Explain. 



'Mt would have been better to get the child's 
Interest from himself.'' 

"t would tend to trust the Involved person^s 
Judgment more." 

"Let the child speak for himself." 



ERIC 



17 

Our Reply 

These are reasonable responses. As indicated In 
our initidl ejnswer to this qjestion, "One way Ss 
to get such assurance from the child himself Me 
admit that when possible, rneasuring sontething in the 
ffost direct avciilable way is oftv^n the best procedure. 
In this case, such ^ procedure would Involved going 
straight to the child arid dsl<;in3 pointedly how 
Interested he is in a particular activity. 

Student Responses 

''I could only accept the teacher's determination of 
each child's interest if t Icnew exactly how sh«» 
determined it. Thare is evidence of a lack of 
control here^" 

•^t don't feel the rtse&rchtrs described this 
procedure well enough/* 

Our Reply 

Perfectly appropriate reactions. 

Student Response 

I do not nice the procedure of using a teacher's Judgment 
because "thfs ts not an objective mothod of assessing 
the interesting condition." 

'•Mo these are subjective observations/' 

Our Reply 

By ''subjective^' we assume the students mean that not 
all observers would dgr<3:a with the teacher that the 
child was interested in a particular task, it is 
true that there is a subjective element in this 
method of assessment, and It is also true that 
when inter- judge agreement is absent, the ratings 
of any one person are very likely to be Invalid. 
Nevertheless, we would caution against an off- 
hand dismissal of ail subjective measurements* 
The phenomena we may have the greatest difficulty 
measuring may sometimes be those most worth 
mcMurlng. One must often ask whether It Is 
better to measure something trivial well or to 
measure something important poorly. 



ERLC 



td 



9. One critic of this study stated that the research 9. One critic of this 

assMines for Its validity that all children were ' study stated that the, 

equally interested in the ''interesting" activity. research assumes for 

Oo you agree with this statement? V/hy? Its validity that all 

children were equally 
Interested In the 

9* We disagree. The assumption being made Is that any ^'interesting** activity 

filven child will be sub^tanr lal ly more Interested Oo you agree with this 

In the • intererting" tasic than in the "uninteresting*' statement? Ifny? 
activity. There no reason why all children must 

be equally interested in their "Interesting" activity^ \ 
nor Is It reasonable to assume they will be. The 
comparisons of interest are between individual 
performances under different conditions and not among 
children in the same condition. 



Student Responses ' 

'There Is no way to s«y that all children were 
•equally interested.'* 

"It Is very difficult to measure equal interest." 
•*The general category. Is interested » Is too 

•The researchers did not take into account the 
degree of Intercut In the Interesting activity." 



'The term Interesting can change from day to 
day with this age group." 



pur Reply 



10. 



The above statements seem to us to be irrelevant 
to the question ast^ed. The implication In the 
above student responses is that It wouid be 
virtually Impo'^'iibJe to demonstrate whether or 
not the children were equaHy Interested in 
the 'Mnteresting" activity. Although this 
claim may be true. Question 9 merely asks 
If there must be equal interest for the 
study to be vai id. 

The researchers state that e^^ch child doing the 
uninteresting tasic "was given the same 
Instructions about leaving to •play gm^s* and 
oi>out returning^ to the ongoin^i activity •" as the 
children in the interesting taslc condition had 
received. Oo you approve of using the same 
instructions in both situations? Tell why you 
answered as you dtd. 




10. The researchers stal 
that each child do Ins 
the uninteresting tas 
•Was slvw the same 
Instructions about 
leaving to ^play 
gams^ and about 
returning to the on- 
going activity »" as 
the children In the 
Interesting task 
condition had recelvi 
Oo you approve of ust 



10. We ck> approve of using the same Instructions In 
both situdtions* \ic want the two motivating 
conditions to be as identical as possible In 
atl respects o^cept for those variables the 
investigators explicitly wish to study. Such 
similarity makes interpretation of results less 
ambiguous* 

Stuctent Responses 

'The use of the words *plfiy gacnes' could hove 
Influenced the attitude of the child.'^ 

"It docs not make good common sense to Instruct 
a child to leave an interesting activity to go 
play games." 

'The *return to on^jolny activity* phrase seems 
to provide a Icey to the results, i.e. child 
Interrupted from the uninteresting task took 
more time and gave more responses before return* 

Our Reply 

One can take exception to the wording of the 
instructions, as did the 'students whose responses 
are quoted above, and still believe, as we do, 
that the instructions should be the same for 
fcoth treatment groups* (Of course "ongoing 
activity" will mean different things depending 
on which kind of task was Interrupted. But 
this difference was precisely the difference 
the invest igatorfe wanted to study*) 

IK In order to support the claim that the 
Interesting and uninteresting tasks indeed 
held those qualities for the children tested, 
the Investigators reported their qualitative 
ippressions of the students' feelings about 
being Interrupted; e.g., "That the ongoing 
activity was indeed interesting to the child 
was evidenced by the groens, grimaces and 
footdragglng that accompanied the examJner^s 
request" and "The children complained while 
doing the (uninteresting) task, some called 
It 'stupid,* and...wci^e uniformly delighted 
when their participation in the games was 
requested^'' Do you approve of such impres- 
sionistic reporting in research studies of 
this type? Uhy? 



the same instructions 
In both situations? 
Tell why you answered 
as you did» 



IK in order to support 
the claim that the 
Interesting and 
uninteresting tasks 
Indeed held tDose 
qualities for the 
children tested, the 
Investigators reports 
their qualitative 
Impressions of the 
students' feelings 
about being inter^ 
rupted; e.g., *That. 
the ongoing activity^ 
was Indeed Interest ir 
to the chl Id was 
evidenced by the 
groans, grimaces and 



20 



ll» Ue approve of such {mpressfonlstic reporting. The 
researcher should be olert to make observations of alt 
phenomena associated with the research Investigation. 
Such observations help us to interpret the nore 
objective dat3 which are available* TMy provide a 
fuller picture of the research context and. In this 
Study, tend support to the judgments about task 
tnterestednesi* Of course » the investigator must 
be alert to the possibility of experimenter bias 
and to evidence which is contrary to his position 
at well at; to that %i^tch supports his position, and 
to report ixjth Iclnds of observations* 

Student Response s 

Wo do not approve of such report »ng because ''children 
do not ?ncan whajc they say/' 

"Emollons u>u)d have Dc^en irade for oth^^r reasons." 

•'Children very often imitate the cxpresflons cf 
thefr peers without actually feeling the same way,** 

•The investigators appe^ir to h;:iVe jumped to conclusions 
In the matter of chltdrcn*s behavior*** 



footdragging that 
accompanied the 
examiners request*' 
and 'The children 
complained while doing 
the {uninteresting) 
task, some called It 
'stupid/ and.. •were 
uniformly delighted 
when their partici- 
pation in the games 
was requ6>ied/* Do 
you approve of such 
impretf tonlstf c 
reporting In research 
srudies of this type? 
Why? 



Our Reply 

The above responses cieorly suggest that the ch|ldrcn*5 
behavior shoild not be taicen at face value and that 
caution should be exerted in Interpreting these 
Impressions of the children's feelings* (because the 
Impressions may be difficult to interpret, howt: -^r^ 
does not lead us to abandon them altogether. 



Student Rgsponse? 

We do not approve of such reporting because it '*caHs 
for subjective judgment.*' 

*Tlr»Mlpgs should include only quantitative measures.*' 
^'Impressions arc not an empirical measurement/* 
^'Reports arc not objective but interesting!" 
Our Reply 

See our reply to the last set of student reactions to 
Question 8« 



ERIC 



12. Mote the design as indicated in the first full paragraph 
of p«ge 354 above the subheading ^'Design/' The invest!* 
gators want to claim that the kind of activity engaged 
In (Interruptc^d) before taking the ''crei>tlvi ty" tests 
effects test: performance. How rany of the following 
variables have been ccntroUed; that is, v^Ich var 
•bias are ruled out by the design and alternative 
explanations for the differential test results found: 
(a) order in %Aich the two kinds of pretest activities 
were Interrupted; (b) form of the creotlvity tests; 
(c) sex of the child; (d) age of the child; (e) 
•Veai creativity** of the diild? 

12. AH f^ve variables were controlled. We cannot attri- 
bute the observed differences between scores on the 
"creativity" tests which were talu»n after an interesting 
task was interrupted and the scores of the tests after 
the uninteresting task, to differences in the order in 
which the two kinds of pretest activities were inter- 
rupted; half the children were interrupted from the 
uninteresting task and the other half were Inter* 
rupted from the interesting task first. Further, 
scores from the two forms of the test are equally 
represented in the two sees of scores being compared* 
(See Table 1.) Finally, since each child was his 
own control • that is, was being compared against 
h3mseJf ^ the sex, age and other characteristics 
such as 'Veal creativity'* were also being controlled. 
The utilization c^ a design which rules out so many 
rival explanations to account for the observed 
differences in test scores under two different moti* 
vating conditions Is cne of the strengths of the 
present study. 

Student Responses 

*1*o attempt was made to fneasure 'real creativity*.'* 
••No cne dared to define *rtr^* creativity**** 
•^Creativity is only a function cf the test used.'* 

Our Re ply 

Variables, such of ? title toe and "real creativity" 

C4n be cc^trolied in an eApcrimcnt even if measurements 
of these variables are not madti or are not possible. One 
way to do this Is b/ randcm disflgnnent to treatment groups. 
Another way to control AbiMty and personality characterise 
tics Is to admln:5^ter the different treatments to the same 
person the technii^uc actually used by the investigators. 
Since the same people arc involved' in the two conditions, 
one cannot claifn that the reason for differences In test 
^'ores between treatments is because the subjects 

ERJC condition were older, had more ''real creativity,*' 

v^sa had longer little toes. 



12, Note the design as 
indicated In the 
first full paragraph 
of page 35*1 above the 
subheading **Des{gn.** 
The investigators want 
to claim that the kind 
of activity engaged In 
(interrupted) before 
taking the **creatlvity 
tests affects test 
perfcnnance. How aieny 
of the following vari- 
aJbles have been con^ 
trolled; that Is, 
which variables are 
ruled out by the 
design as altematlve 
explanations for the 
differential test 
results founds (a) 
order In which the 
t%fo l(inds of pretest 
activities were inter 
n-T)ted; (b) fomi of 
the creativity tests; 

(c) sex of the child; 

(d) age of the chlK^; 

(e) **real creativity*' 
of the child? 



Student Responses 

•*l don*t know what you incan by 'reel creativity*." 
"The concept •real creativity' confuses me/' 
••Uhet th^ hell dees 'real creativity' mean?'' 

tfur Reply 

We too Jon't know what ••real crestlvlty^* nieans. V/e 
used thH vague term Ko c^Tphastze the point that it 
doesn't matter what such terms n«an (for purposes 
of the Issues discussed in this question) since 
each child fs being compared to himself/herself, 
tf Is in this sense that we say that *'real creativlt/* 
has been controlled; (t has been ruled out as an 
explanation for the finding of treatinent difference. 



CofT^tent critiques of experiments require the 
reviewer to ccniprehcnd fully the research design. 
Especially v;hen severol variables are. used, many 
readers find it useful to construct schematic 
diagrams to serve as a visual reminder of Uie 
experimental set-up. For example. If three 
students (S. , and S-) received treatment Mj 
and three other students received treatment M^, 
this ^vr^fiq^tx^.nt might be pictured as shown in 
equivalent fFocres J^-l c:nd ?3*^2. 



13- 







■ , ' 


^5 ^6 



Figure 





^1 










"2 





Figure 13-2 

For the present ritjdy, consider hc^w the design used 
for the analysis of varicnce calcuidtlons might us 
Illustrated. Firtt, review the special note about 
p, SS'J. Second, pfck which one(s) of the four 
schematic diagrams below correctly displayis) the 
design used« 

you do not need to know anything about analysis of 
variance to answer this question. You do need to 



luiow that N| was used to represent the interesting 

1^^^,., . ... 

ERIC 



vat ion condition and ^2 the uninteresting 



Competent critiques 
of experiments require 
the reviewer to com* 
prehend fully the 
research design. 
Especially when 
several variables 
Are usedt many re^dtrt 
find it useful to 
construct schenoatic 
diagrams tc serve as 
a visual reminder of 
the experimental set* 
up. For example^ tf 
three students (S^ , 
and S^) received 



'2 — -3 
treatment H 



and thre* 



other students recelv 
treatment H^p this 

arrangement might be 
pictured as shown in 
equivalent figures 
I3*> and 13-2. 





"2 


S, S3 


\ h h 



Figure >3-> 



vot Nation condition. Further » 0^ and O2 represent 
the bro orders in which the leotlvaltng conditions 
wai'e present, S, to S_ the 32 students » and the 
tycbo^ X a score on tne dependent variable. Uote; 
syii6ols to differentiate the two sexes, the tno 
test form and the four age groups were not 
needed as these three variables were not Included 
In the analysis of variance calculations. 



16 



1? 



(b) 















Sj....Sg 


Sg S^^ 






X * • • X 


X» • » X 


X • • • X 


X« • • X 



(c) 







^1 




"2 1 








^6 




1 

• • « 






X 


« 4 • 


X 


X 


• • « 


X 




X 


• • • 


X 


X 


• * • 


X 



ERIC 







1 "1 










j "2 




x 


x 


X 


X 




X 


X 


X 




• 
• 
• 


• 
• 
• 


• 
• 
• 
• 


• 
• 

• 

« ■ 


• 
• 
• 




X 


X 


X 


X 


V 




X 


X 


X 





^1 




c 




C 












*6 



Figure 13-2 

For the present stud^^. 
consider how the 
design used for the 
analysis of variance 
calculations might be 
Illustrated* Firsts 
review the spectat 
note about p. 35^* 
Second, pick which 
onQ(s) of the four 
schematic diagrams 
below correctly dls«- 
pl«y(s) the de^'ign 
used. 

You do not need to 
know anything about 
analysis of variance 
to answer this qu^**' 
You do need to Icnow 
that H| was used to 
represent the Inters 
Ing motivation con- 
dition and the 
uninteresting moti-* 
vation condition. 
Further, 0. and 0^ 
represent the t^jo 
orders in which t^^. 
motivating conditfcn- 
were presented, S. 

the 32 st*Jdcnt%f 
and the syoibol X ^ 
score on the dcpon'-* 
variable. f4ote: 
sy^rfeols to differenti- 
ate the two sexes, C 
two test forms and 
four age groups 
not needed as these 
three variables wore 
not Included in th^ 
analysis of varfenc 
calculations. 
(See left side^of t' 
page for the^ four 
diagrams*) 



2% 



NQt«: For «a$0 of repretentatton, the dots are lloCa: for ease of 

used to signify the Cimlssiori of socne of representation, 
the students and their scores. the dots are us«d 

to signify the 
oolsslon of torn 

13* OlasracRS and 13c are equivalent and correct of the students and 

iMys to Illustrate the analysis of variance desl^r^ their scores. 

%rfitch was used* tn both diagrams note that a 
different set of 16 students belongs to each 
order*of*motlvatin9 condition. (In technical 
jargon» the variable, student. Is nested withfn 
the variable, order^of-notlvating condition*) 
Further, note that each person provides scores 
under both inotlvating conditions* (In technical 
Jargon, the varidbte, student, is croi^^ed with 
mt}vatir;9 condition.) 

Dtagram t3b Is not a correct representation 
because each student is shown receiving only 
one of the two irotivatlng conditions and as 
contributing but one (rather than two) scores 
on each dependent variable. Design i3d has 
each student contributing four scores on 
each dependent variable and has him receiving 
the motivating conditions xir^der both orders. 
It too is incorrect. 



H. On p, ^Ski under the subheading ''Oesign,'* the 
researchers mention both ^n age effect and 
an **age by motivational condition Interaction 
effect.*' if there were an age effect in this 
study. It would fnean that the average creativity 
tetjt $cort2$ for the several age groups differed - 
that children of different ages, as a grcui/, did 
tiot do equally well on the creativity tests. 

One of the key concepts of en^irlcal research 
ts the interaction between two var idbtes« What 
would have to be true about the creativity test 
scores of the children If there was an ''age by 
cotivational condition interaction effect?'* 
(The purpose of this question and the discussion 
to follow Is to help you be clear about the 
meaning of the term, interaction , rather than 
to ask you about your opinion whether it is 
reasonable to expect such an interaction.) 



14 « On p. 3S4« under the 
subheading *^Deslgn/* 
the researchers 
aention both an age 
effect and an ''age by 
motivational cortditlor 
Interaction effect/' 
If there were an age 
effect In this study. 
It would mean that 
tha average creatlvit* 
teat scores for the 
several ^ge groups 
differed — that 
children of different 
agei» as a group, did 
not do equally well 
on the creativity 
tests. 



\k. The presence of an «9e by inotivationdl condition 
interaction efftct would mean that the differences 
In creativity test performance under the two 
ootlvating conditions would vary anong the 
several ege groups* In other %^rds» such an 
Interaction effect would mean thot the differences 
In the affect (on test performance) of the kind 
of activity Interrupted depends upon the age of 
the child Involvi^d. 

Study. t Responses 

Interaction between motivational condition end 
age occurs when ''motivational condition affects 
each age differently.'* 

^Ut would tell that the affect that motivational 
condition had was not the same for all age levels/' 

^niie older the child the greater the difference 
between the two test scores.'' 

*^ach age group's amount of score change (under 
the two conditions) Is different from that of 
each of the other group's." 

Our Bcply 

These responses are essentially correct. 

Student Re spons es 

- - - - 

"If there was an age by motivational condition 
effect* the sccres would differ depending on 
the age of the child.'* 

"As one grows older, his creativity scores 
Increase (go higher) or vice versa." 

'TThe older the child, the higher the test 
scores would be." 

'•The scores would vary from age to age." 
Our Reply 

These respotises are incorrect. They describe 
%tfhat would be true if there were an age effect « 
but they do not describe an interaction effect 
between age and motivational condition on test 
score. 



One of the Icey con- 
cepts of emplrlcei 
research (s the Inter* 
action between two 
variables. What 
would have to be tjpie 
about the creativity 
test scores of the 
children If there were 
an "age by motivational 
condition Interaction 
effect?*' (The purpose 
of this question end 
the discussion to 
follow Is to help 
you be clear about 
the aiaanlng of the 
terait Interaction » 
rather than to ask 
you about your opinion 
whether It Is reason^ 
able to expect such , 
an Interaction.) 



EKLC 



2i 



15* (a) What depen4«nt varlebles were used In the 
study? (b) U it a good Idea to use more than 
^rie dependent variable in a study? \^y7 

l$« («) The dependent variables are those which 
ar^s^fected by the values of the other vari- 
.ables>aoiLltv^« values ''depend*' upon the 
conditions under which an Investigation Is 
conducted. In this study, the dependent 
variables are measured by the creativity 
tests for the effect on such tests ts of 
interest. More specif ical ly» three separate 
creativity tests were used and two scores 
nuR6er of responses and nuoiber of unique 
responses * were computed for each test. 
However, for use In the analysis of variance, 
the researchers added the number of responses 
from alt three tests to form a new composite 
varl^le. They also cofnputed a total uniqueness 
score by adding the unique response scores for 
the three tests. These latter scores are 
reported in the paragraph below Table t on 
p. 355. 

Student Responses 

'The dependent variables were the interesting tasks 
and the uninteresting tasks.*' 

^"Conditions before testing, Interesting or 
uninteresting.'* 



IS* (a) Vlhat dependent 
variables were used 
In the study? (b) is 
ft a good Idea to use 
more than one depen** 
dent variable In a 
study? Why? 



••Mot i vat I o 



Our Re pl y 



These responses are not correct. Thf^ nature of 

the Interrupted task (thA.t is, the motivating ^ 

condition) was the primary KIOEPENDEIIT variable 

of the 5t««dy wHo?^ ^ff^.ct on the dependent variables 

was being studied. 



Student Responses 

'•Score change was the dependent variable.*' 
"The changed scores between the two tests," 



< 



ERLC 



ft Is reasonable to think of the dependent varl* 
ables as change scores on the several indices of 
creativity. For example, th^ statistical test 
of the difference between creativity scores 
under the two nctivating conditions (s equ{« 
valent to the statistical test of whether the 
wean change score Is zero. 

15» (b) Although there are some inconveniences 

and possibilities for contamination, on balance 
we approve strongly of multiple dependent vari* 
ables In a study since it is possible that the 
effects being sought 11 show up for some 
dependent variables and not for others* A 
study of the pattern of these results can 
provide 3 ntore complete insight into the 
phenomena under consideration. 

S tudent Responses 

"'Using more th§n one dependent variable Is a 
good Idea because it gives a better checl^ on 
treatment effects." 

"Yes, you have a stronger case for gene rail z- 
ability when you use more than one test.'* 

*'Ycs, It Is well to use more dependent vari- 
ables in Cf^r to get f?r>rc information.'' 

••Yes, especially with a concept like creativity 
where a definite unl versa? Instrument is not 
available/' 

••Using several dependent measures is an - 
efficient way of collecting a lot;of data 
at once. Also, If the variable uicosurcd Is 
not well defined, as ?s the case here, using 
irore than one ireasure provides a way of con- 
verging on the concept under consideration/* 

Our Repl y 

V/e concur with these reasons* 



ERIC 



28 



16. On iSk under the nioln heading RESULTS, is 
wrtttcn: ^' Hottvatlng condition . The F for thii 
v«rUblc was and was signif icdnt*^beyond 

tfm .01 tevei.*' (£) t/hdt was sfgnlffcant? 
(b) \fhat does it mean to be "significant beyond 
the .01 leva 1 7'^ (if you have not studied 
Statist'ics, you probably wHI not be able 
to answer these qucrtions. Hevorthetess , you 
should st^i^V our dtsru^slon for it 5s intended 
to help ycu understand freq'jently used state- 
ments the or.c quoted above.} 

16. (a) Sifictty speaking It is the value of F 
which is slgntvlcont. (JF refers to a statistic 
computed as part of the anatysts of variance.) 
Also, the 25 point difference In mean number 
of responses produced under ichc uninteresting 
{57.09) ^d interesting ilZ.OS) conditions 
was "significant'* in the statistical sense 
of the word. 

(b) If the null hypothesis of no difference 
in the inc^ns of the rest ^c:ores obtained under 
the two iTOtivating conditions were true, then 
tha probability of obtaining the size dif- 
ferences reported in Table 1 (or d}fference^$ 
even inorc extrecne) is less than one chance in 
a hundred. In this case, significance nieans 
rejecting the notion of equal group means In 
the population. *'Beyond the .01 level'' means 
that the probobility is less than (i*e., 
"beyond**) .01 that sa^npie results as extreme 
as those found would occur if the ro difference 
hypothesis were tnic. 

Student Responses 

• 

To be significint beyond the .01 level means 
thct "only \i, of the time will such (extreme) 
results occur because of chance or sampling 
error. »• 

'H^he probability is less than \% that the 
cbservf»d date (or those more extreme} could 
have occurred orily by chance." 



16. On p. ZSk under the 
main heading RESULTS, 
Is written: * *Motl- 

V vatlng condition. 
The F for this varl- 
irt>te*was 5t«S6 and 
was significant 
beyond the .01 level.'' 
(a) t/hat was slgrjlf I- 
cant? (b) What does 
it mean to be *Slgni- 
f I cant beyond the «0l 
level?'* (If you have 
not studied statistics, 
you probably wilt not 
be able to answer 
these questions, 
nevertheless f you 
sFiould study our 
discussion for U Is 
Intended to help you 
understand frequently 
used statements like 
the one quoted above.) 



ERLC 



Th«se interpret<nt(ons are correct, dote that 
they discuss tl;e probability of the observed 
data occurring if something (chance alone 
operating) were really true. A statement of 
a student that "the (expression) indicates 
such resutts as these would happen less 
than 1% of the time" is correct as far as it 
goes but It needs the qualifying phrase» 
If chance alone were operating, to be cofrr 
pletely correct. 

Student Responses 

"Significant be)^nd the .01 level means that 
the probabrlity of results having been Influ- 
enced by chance is te^G than }%.^* 

•7ha probability for the chance could be 
happened less than \%.*' (sic) 

''Less than ]% possibility that results were 
obtained by chance." 

"It ncans that less than .01 of the tl5!?-f 
chance will be the only causative factor." 

Our Rep l y 

The above responses and their variants are the 
irost frequently made, and they are not correct. 
Equally incorrect are statements that you are 
$3% confident that chance alone was operating 
e.g. "the chance that differences in creativity 
scores are caused by aianipulatlon of motivating 
conditions, and not by chance. Is at least S9/tOO«" 

The difficulty with these responses ?s that they 
state the probability that something is really 
true beyond the sarpple results. (In this classical 
use of probabifily, <>ither chance alone was 
operating or it wasn't the probability Is 
either I or 0.) You should carefully compare our 
Initial answer to this question and the first set 
of student responses which we said were correct 
to the set of student responses directly above 
which we labeled as fncorrect. The former give 
the probability of sample results given a correct 
chance-alone hypothesis (the correct Inter* 
pretation); the latter give the probability 
of the chance- alone hypothesis being correct 
given the sample results which were found (the 
Incorrect Interpretation). 
O 



I7« Although adroittinc) that norms arc not required 
to test the hypothesis of the study» one student 
suggested that if national norms for the creativity 
tests had been reported by the Investigators, we 
could see whether the uninteresting task was 
responsible for better* than-^expected performance 
or, alternatively, whether the interesting task 
was responsible for poorer- than^expected perfor- 
mance. Oo yotj agree? Explain. 

17* Ve do not agree. Because the \/OIS students fn 
ths st^idy mey not be typical of the test 
standardlTatlon group, we cannot: determine how 
the \/OIS students wlgUi bsve scored, without 
any unusual pretest conditions, cor^pared v;ith 
a norm group. Their medn score might have been 
either lo^Ycr or, more like?-/, higher than the 
norn grdvp mean, ^iince we cannot establish that 
the fiorci grou|; mc<*^i and t^e UOIS group mean uncier 
uor^nal lestii^ci ronultlons wo'i'd be the same, sny 
ccppaf J i>on of tho ipst rn>L»:c<. v/?th norm group 
scores would be a nxianingless endeavor. For 
e?can>pJc. if the uninteree tirg task group score:, 
above the norni group tr^^.an and the Interesting 
task oroup scores at the norm grnup mean^ it could 
be that; (o) the uninteresting task spurred the 
students on to better- than-expectcd performance 
or thiJt, (b) rhe /nterest/ng task lowered the 
performance below the level expected of \K})5 
students. 

Sc^DT/rhat asldc> it nioht have been useful to 
have a third matched group from the same MviS 
population take the tests under standardized 
administration conditions. Such a third group 
(a) could help determine it the uninteresting 
task had a positWe effect, the interesting 
task a negativ^f effect, or both, and {b) could 
provide data on the rypicalness of the'l/OIS 
children on the creativity measures « Ho^^ever, 
an Investigator cannot study all the questions 
he/she might like to, or, in a single study, 
cannot gather ail the data of some benefit. 
Priorities must he made. \/e do not criticize 
the researchers of this study for failure to 
Include such a control group. 



30 

17* Although admitting 
that norms are not 
required to test the 
hypothesis of the 
study, one student 
suggested that if 
national norms for 
the creativity tests 
had been reported by 
the investigators, we 
could sec whether the 
uninteresting task 
was responsible for 
better* than-expcctcd 
performance or, 
alternatively, 
v/hether tl>e Interest* 
Ing task was responsi- 
ble for poorer- than - 
expected performance. 
Do YOU agreu? Gxplawi 



ERIC 



3» 

Student Responses 

'Vh8t conditions were the norms obtained under? 
Conditions of niotivatfon were probably not con-- 
stdered for the norms; therefore, the^f src not 
relevant to this exporicnent.'' 

••Ho, norms would not be useful because they were 
not derived under the same experimental conditions 
AS the study." 

pur Reply 

These students seem to miss i!ie point of the 
question. It is recoyni/ed that the context 
of testing wau different between that found 
ic the study <ind that pre^tnt when the tests 
were normed. The qut'^lion astccd whether, 
therefore^ the difference between the WOiS 
and norm resu'ts could tell us anything about 
liow the context of testing effects test per*- 
formance '^^ specifically whether it tended 
tQ raise the results (of one of th^ motivating 
groups) or lower the results of another* For 
the reasons given in cur initial answer, we 
concluded that the norm information would not be 
of much value. 



18. About the middle of p. 355 ihe researchers 
speculate why, contrary to all the other 
chlldrcn> two children gave more responses 
when taken away from an interesting task 
than when taken away from «n uninteresting 
task* Should they have made this kind of 
speculation in a paper of this type? Comment. 

16. Oefinltftly. The purpose of research is to 
explain phenomena. It is quite proper, In 
fact laudatory, that the investigators share 
their Insight^ with the reader even though 
they cannot prove their claims. It is con* 
stdered good research form to separate 
speculation and after the fact oplnionatlng 
from the IJne of theorizing to wMch the 
study was specifically directed. The 
Investigators have clearly made this 
division- 



18. About the middle of 
p, 355 the researcherr 
speculate why, con^ 
trary to all the 
other chl Idren, two 
children gave more 
responses when taken 
ftway from an Interest** 
ing task than when 
taken away from an 
uninteresting task. 
Should they have 
made this kind of 
speculation in a 
paper of this type? 
Comment - 



ERLC 



J2 

Student ftesponsfts 

••y^s^ It seems reasonable to suggest a possible 
reason for a resuU that doesn't "fJt*. It 
ml^ht lead to further investigatton» just as 
their orfginal speculation lead to this study/' 

Our Rep<y 

We agree » 

Student Responses 

"This speculation wos not necessary, especially 
since the Invest I gatr^rs did not state ftny 
attempt to^Pr-ard t-ige.-group analysis.*' 



'\ * •5ugr,c!^ts th.'it ogc should have ben one of 
the variables.** 



* 'A I though the ag 
Into tha design 

•'If they want to 
consider the ag«r/ 




variable was not tncorporated 



replicate, they may want to 
factor'* 



Our Rep \^ 



Although age wss not included in the analysts of 
variance calculation, ths creativity test data 
associated w! th the two rT>otIv5tion conditions 
r<cporCcd in T^^blc I were further subdivided by 
age groups. Contrary the student responses 
quoted above, the age factor wa-j a variable in 
Che desion and was considered. 



A bit further down on p. 355 ihe Investigators 
write: **..-no'' were there significant sex dif- 
ferences with respecc to the motivational factor** 
To what Jcind of significance (statistical or 
practic;^!) arc the investig?itcrs ^^ferring here? 
Or can' t we tel 17 

The type of significance is not cl«^3r. ftot 
enough information is given to answer the 
question with certainty. The researchers 
could be referring to statistical signifi- 
cance although they do not report conducting 
liny significance test of such .in hypothesis. 



21- A bit further down 
Oft p. 355 the Invest I 
gators write: "wnor 
were there significant 
sex differences with 
respect to the moti- 
vational factor**. To 
whet kind of signifi- 
cance (statistical or 
practical} are the 
investigators refer- 
ring. here? Or can't 
we ten? 



ERIC 



33 



On the other hand^ the invest! 90 tors could 
merely have noted the differences in mean 
scores under the tvio iDOti voting conditions 
for each sex and concluded, without conducting 
a statl&tfcal test, that the differences of these 
differences were not of practical significance 
(that is, were not very important). 

Student Responses 

(a) think the significance they were refer- 
ring to is the fact that girU tend to do 
better on tects rcquirlnp verbal responses.*' 

(b^) **Ho 5 ifjnl finance differences 5s an i nitres t** 
ing observation to rrwk/i? b^;cau<;e I would 
9uess girls to be more creative than boyr^,*' 

(c) assumed toe researchers were saying that 
there was no difference In the motivation 
of boys and pfrls.'* 

Our llef)jy 

These responses iUustrate a confusion about the 
differences being discussed. Regarding response 
is) • <'lff«»*ences in motivation are not involved 
for motivational condition (type of task Inter- 
rupted) is an Independent •^^ not a dependent 
variable which js assigned to the diildren, Uy 
no significant se^ differences with respect to 
the motivationa l factor the investigators are 
referring to o lack of interaction between sex 
and motivational condition. That Is, they are 
not cUfntlng that boys <}nd girls scored about 
the sarne on the creativity tests (as implied 
In response b) , but rather that the differences 
under the two riot! vat ing conditions for boys 
and the corresponding di f fe rence s for girls were 
themselves not significantly different. 



20. Jn the afastrect and at the botton of p. 355, the 
researche«"s state that one group was almost twice 
as creative as the other. What assumption about 
test scores is necessar/ to Justify this remark? 



ERIC 



20. In the abstract and 
at the botton of p. 
355 » the researchers 
state that one group 
was almost twice as 
creative as the other. 
Vfhat assumption about 
test scores is 
necessary to Justify 
this remark? 



V 



20* The assurrption (s that stating twice many 
w$es for objects (the results reported by the 
Investigators) means having twice the creativity, 
(tn more technical !angudgc» the assumption is 
thct the test scores tneasure creativity at the 
neasurement level called a ratio scale.) The 
sefitcnces in question would have been more 
accurate and less misleading if they had been 
worded either in terms of '*twice as many 
responses" or ''significantly inarc •creative'.** 

Hyjy students iaic rhat "the r.ctto .tiuot be 
vaUd ri-ftoMJres of creativity," 

C ur fiepjy 

This Is correct, as far as It goes. More than 
validity Is required, however, before wc can 
make the *:wice-as-crcatl ve interpretation. 
Our theriTonietcr is a valid ricasure of tempera- 
ture » but we would not sgy a reading of an 
ouistde teiTeperature of G mdica^cs twice 
the heat of a reading of 3 . (0 Does not 
Indicate absolute Uck of heat just as zero 
ntnuber of responses cn the creativity test 
does not Indicate absolute lack of creativity.) 

Student Responses 

'•The standardized norms of a test have to be 
known before one can make the assumption that 
one group wa$ rwicc as creative as the other.'* 

Our Reply 

We disagree. Although having norms would permit 
MS to make a comparison with the performances 
of such a standardized sample, they would not* 
by sofne my^tertouf. process, give to the scores 
this ratio scale prope*^ty about which ve spoke. 



21. In the top paragraph of p. 356 » the investi- 
gators Iniply relevance of their study to the 
creatlvity-intell igence dichotc*ny. Is theJr 
study relevant in this regard? (See special 
note obouc p. yji.) 



ERIC 



In the top para*- 
graph of p. 356 t th^ 
Invest I gato**? Imply 
relevance of their 
study to the creatl* 
vl tyintel llgence 
dichotomy. Is their 
study relevant fn 
this regard? (Sec 
special note about 
p- 356-) 



35 



2K ilo knew of no relevance of this work to the 
creatlvlty-lnteP Jgence d«chot::kmy and are at 
a loss to e^^pldin why «fiy reference to It was 

Student Responses 

"While this study does not dlspro/t the 
creat ! vl ty- intrlJ ;{/f>nce dichotomy, It 
Implies that rnorlv^tlng factors may be as 
Importfint or mors }n^>ortanl \n generating 
Creative responses.** 

Perhaps the study h^r- relevance to the 
crcatlvlty-lntenigcnce dlcliotorry 'Mn the 
tense that a relationship between creativity 
and Intelligence should take motivation Into 
dccouni', 

•^Tangent felly relevant to the Jcrger problem 
of d^sflrlng ^creative ability' distinct troffl 
•Intelligent behavior',** 

Our Reply 

Although the f*bovc student responses have merit, 
at best they only make a case for a rnost indirect 
kind of relevance that the study might have to the 
question of whether intel licence anc creativity 
are distinct trails. 

22. \/hat would you say is the main conclusion of 
the study? 

72. The Investigators would probably cJain that 
their fTiiJn conclusion is that their results: 
**. . .hi^hHoht the rmportonce of considering 
wtlvational crntext effects y/henever ve 
evaluate psycholc^i '"el or educational test 
performance." (Th's concluolon can be worded 
many ways and f,t'' 11 retain its essence -~ 
that test p'j'forinonce depf^nds upon cpotl- 
vatlon or, in le<s abstract terms, that the 
type of tos*; engaged in prior to testing 
ran markedly effect a child's measured 
creativity.) i^cgardless which wording you 
pref'^r, bccocisc of r^any weaknesses (especially 
those discussed \n the general critique) we 
cennot ^sscsc this $tud/ .^f a rigorous 
e;^a^lnatlcn of rnot i vat «ona! context effects. 



22. Slhi^r would you say 
is the main conclu* 
lion of the study? 



3^ 

Student Response 

* 'A- valid conclusion cL>nrot be cJrawn from an 
?nvand study.'' 

Our P/eply 

Me disagree. For example^ fortune tellf^rs 
are frequently right, especially when, pre- 
dictions are made which agree with one's 
expectations. One 5tu<lftnc» fr?istrated by 
our answer to question 22, said: ''Dut J 
feel that wc can and should consider the 
diotl vationdl factor in psychological 
teftlnq.*' \/e fee) th^t way too. but ^Mjr 
conviction was but sHohjtJy strengthened 
by this part5cular Investigation. 



J'-^son Mi 1)r»an and 0. Bob Gowin 
Cornell University 



ERLC 



Appendix IX 

Joanne Kcy7X)lds i^ronars 

Tamper in,'^ \/itji Nature in Elencntary Sc! ool Science 

Tiie Educational Forur) 
Noveniber 1968 



ERIC 



Ik' 

"Tair^ring with Nature <»in Elementary School Science. 



Joanne Reynolds Bronars 
The Educational Fonini 
November 1968 



1. Bronars is responding to the need to 
undertake, •'A careful examination of 
the assumptions underlying eaqperimen- 
tatioD with living tilings in the ele- 
mentary school science program." v^hat 
coRmon name or classification do we 
give to this kind of critical analysis? 

Usually we think of s\ich studies as 
philosophical research. Somewhat aside, we 
would like to point out that, one of the tra- 
ditional tasks taken on by philosophers of 
education has been the examination of educa- 
tional theories and practices so that the 
basic assumptions and values inherent in 
them may be uncovered and cleaurly displayed 
for all to s^e. VThe Bronars article follows 
this traditiona^ form of philosophical re- 
search, as it presents an aspect of the ele- 
mentary school science curriculum and probes 
beneath the surface of a set of particular 
activities to ask normative questions about 
%diat values we may inadvertently teach by 
engaging students in such activities. 



Bronars is responding to the 
need to undertake, "A care- 
ful examination of the assunp- 
tions xjuiderlying experimentation 
with living things in the ele- 
mentary school science program*" 
What common name or classifica- 
tion do we give to this kind of 
criti :jal analysis? 



Answer: 



The obvious audience to which this and a 
host of similar philosophical articles is 
addressed is the educational practitioner, 
forcing him to be reflective about his prac- 
tice, not in terms of its efficiency or tech- 
nical propriety, but more fundamentally in 
terms of its broadly human and ethical dimen- 
sions* Since the time of Socrates, philos- 
•ophers have served as such "gad-flies" to 
force the public and personal reflection up- 
on our basic values, beliefs, and attitudes, 
and to thereby bring us to lead the "ex£unined 
life." Especially in so basic a human activ- 
ity as education, such an examination is essen- 
tial to allow us to consider wisely what we 
^ are about in terms of its deepest dimensions 
■ and far-reaching ramifications for the nur- 
turing of human beings in the ways of civil- 
ized life. 



ERIC 



BRONARS ~2 



2. Is the Bronars article an educational 
research paper? 

Yes. It is a study of educational prac- 
tices^ but it is not primarily an empirical 
study (i.e., it is theory-based rather than 
experiment based) . The predominance of em- 
piidcal research in education, and the con^ 
sequent stress on the methodologies of such 
research « seems to lead many people to be- 
lieve that only empirical studies conform- 
ing to certain methodological norms are prop- 
erly called "research." Typically, philosoph- 
ical research continues the oldest tradition 
of research — that based on careful observa- 
tj.on of the world and reasoned thought ahout 
it. 



Is the Bronars article an 
educational research paper? 



Answer : 



3. One form that logical arguments eJ^out 
educational practice can take is the 

) practical syllogism. This form usually 
has three parts: 1) the normative pre- 
mise (s), i.e., a statement of what is 
good; 2) the empirical claims or alleged 
facts in the case; and 3) the value 
judgments or conclusions about what 
should be done. It is never explicitly 
stated in tiie Bronars article but one 
possible argument is the following: 

Normative premise: Reverence for life is 
a good thing • 

Empirical claims: a) Many elementary 
school teaching practices in use today 
do not instill a reverence for life, 
b) There are educational practices avail- 
able which do instill a reverence for 
li fe. 

Conclusion: Adopt these preferred prac- 
tices. 



3. One form that logical arguments 
about educational practice can 
take is the practical syllogism* 
This form usually has thyee parts: 
1) the normative premise (s)^ i.e., 
a statement of v^at is good; 2) 
the empirical claims or alleged 
facts in the case; and 3) the 
value judgments or conclusions 
about what should be done. It 
is never explicitly stated in the 
Bronars article but one possible 
argument is the following: 

Normative premise: Reverence for 
life is a good thing. 

Empirical claims: a) Many ele- 
mentary school teaching prac- 
tices in use today do not instill 
a reverence for life, b) There 
are educational practices avail- 
able which do instill a reverence 
for life* 



Does the fact that this argument contains 
normative judgments rriake the argument in- 
valid? 



ERLC 



Conclusion: Adopt these pre- 
ferred practices. 

Does the fact that this argument 
contains normative judgments make 
the argument invalid? 



BRONARS -3 



3. cont'd 

No, It is a valid argument. Since the Answer: 
conclusions follow from the premises, we say 
^ that the argument is valid. The facts as 
claimed or alleged , however, may not be true 
as stated. (Note: Logical validity is 
not the saine concept as empirical (fact- 
based] validity. It is unfortunate that 
the language of research uses the same 
term, "validity," in two very distinctly 
different ways.) 



4. The article contains the recommenda- 
tion to change the orientation of ele- 
mentary science programs from experi 
mentation with living things to obser- 
vation of them. Is this change neces- 
sary in the light ol the normative 
premise, "reverence for life is a good 
thing?" 

No. We agree with Bronars who answered 
our question (personal communication) as 

follows : 



4. The article contains the reco- 
mmendation to change the orien- 
tation of elementary science 
programs from experimentation 
with living things to observation 
of them. Is this change neces- 
sary in the light of the norma* 
tive premise, "reverence for 
life is a good thing?" 

Answer : 



"The normative premise is not that 
of unqualified reverence for life but 
rather the importance of a develof>cd 
attitude toward nature which involves 
a sense of purpose and responsibility. 
The point is not that experimentation 
should not be carried on, but that 
when it is carried on it is for the 
purpose of thoughtfully conceived end^i 
which adults have assumed responsibil- 
ity for achieving. That is why I am 
suggesting that the focus be upon ob- 
serving wht re children are concerned." 



5. The inveritigator considers three assump- 
tions people use in support of practices 
that "tamper with nature." If we assume 
that her arguments against them are con- 
clusive, does such a refutation of the 
assumptions conclusively support her 
main argument? Why or why not? 



ERIC 



The investigator considers 
three assumptions people use in 
support of practices that "tam- 
per with naiture." If we assume 
that her arg'uments against them 
are conclusive, does such a 
refutation of the assumptions 
conclusively support her main 
argument? Why or why not? 



BPDNARS -4 



5. cont/d 

No. They are logically independent. Answer: 
ttiat is, a person could either agree or 
disagree with each assuiqption and still 
either agree or disagree with recocnoenda- 
tions for educational practice. 

Even if all three assumptions are rejec- 
ted^ as Bronars rejects them, a person could 
still agree or disagree with her educational 
recommendations . 



6. Bronars is concerned with what children 
learn when they have learning experiences 
involving the killing of flies and grass- 
^PP^rs. To what kind of learning nilght 
she have appealed to support her argtJinent? 



Many empirical researchers and educational 
thinkers have coranented on the notion of inci - 
dental or collateral learning. It is always 
^appropriate to ask what elgje children are learn- 
ing when we teach them. Thp Bronars article 
stimulates us to ask if we are teaching chil- 
dren to disregard reverence for life when we 
use living organisms as subjects of experi- 
ments in school. We would expect empirical 
research to show that in son^ cases we do en- 
gender the ^'wrong" belief systems through such 
experiments. 



Bronars is concerned with t^iat 
children learn when they have 
learning experiences involving 
the killing of flies and grass- 
hoppers. To what kind of leam*- 
ing might she have appealed to 
support her argument? 



Answer; 



7, Does this article contain any data? 7. Does this article contain any 

data? 

Yes. Check page 277 where Bronars reports 
responses obtained from college students which Answer; 
indicate a continuum of attitudes toward liv- 
ing things, plus the reasons which justify 
t^iese attitudes. She also quotes a datum 
from the New York Times about the availabil- 
ity of living creatures from a publishing 
con^any. She also reports other information 
that is properly considered data. 



8. Bronars takes exception to some of the 
present classroom practices in elemen- 
tatry school science, a) Upon what sources 



Bronars takes exception to 
some of the present classroom 
practices in elementary school 



ERIC 



BRDNARS 



-5 



8. cont'd 



of information does Bronars draw to 
describe these practices? b) Is there 
any reason to doiibt the validity of her 
description of these classroom prac- 
tices? c) Is it important that her 
description be valid? 

a) Elementary school science textbooks, 

b) Units reconmended by textbook writers 
may not be the ones actually used in the class- 
room. Observation of classrooms or reports 

of activities actually taJcing place in class- 
rooms would be more valid indicators of 
classroofn practices. 



science, a) Upon what sources 

of information does Bronars 

draw to describe these practices? 

b) Is There any reason to doxibt 
the validity of her descrip- 
tion of these classroom practices? 

c) Is it important that her des- 
cription be valid? 



Answer : 



c) Yes eind no. If the practices Bronars 
is complaining about occur only infrequen- 
tly, then the article no longer has much 
practical significance. On the other hand, 
as long as some teachers behave as described 
(%fhich is most assuredly the case) , then the 
validity of the article turns not on the fre- 
quency of these "objectioncUDle*' practices 
but on the clarity and coherence of the argu- 
ments. 



9. Does the information in the second half 
of page 277 help Bronars to reject Assump- 
tion #2 on page 276? 

The data will help to the degree that the 
responses made by college students and refer- 
red to in the article will generalize to chil- 
dren. Bronars' data have force only to the 
degree that we assume children would respond 
in a similar way. 



Does the information in the 
second half of page 277 help 
Bronars to reject Assumption 
#2 on page 276? 



Answer: 



10. Write out Bronar's definition of the word 
"pest." Most primary dictionary defini- 
tions call attention to the historical 
origin of the word, and define "pest" 2is 
any organism capable of causing a fatal 
disease in epidemic proportions, Obvi- 

I ously her definition differs from the 

primary definition of most dictionaries. 
Characterize this difference and discuss 



10. Write out Bronaur's definition 
of the word "pest." Most pri- 
mary dictionary definitions 
call attention to the historical 
origin of the word^ and define 
"pest" as any organism capable 
of causing a fatal disease in 
epidemic proportions. Obviously 
her definition differs froci the 



FRIC 



BRONARS -6 



10. cont'd 

primary definition of most dic- 
tionaries. Characterize this 
difference and discuss its im- 
portance in terms of Bron«irs' 
argument . 

Answer : 



Bronars eurgues that what some adults (e.g.. 
Science Text writers) consider to be pests 
and worthless, other people (suctt. as teachers 
and children in their classes) may not con- 
sider as pests and, therefore, should not be 
harmed. This argment requires that "pest" 
not be considered a descriptive term (in which 
case there would be widespread agreement) . 
Rather, her eurgiiment requires an evalua- 
tive definition so that "we cannot describe 
certedn living things as pests per se . " 
^{p. 277) . Bronars stipulates her defini- 
tion of pest, and that this definition con- 
t2dns within it the evaluative phrase, "in- 
convenience to one employing the term." 
She has chosen one meaning of "pest" over 
other meanings readily associated with the 
term without giving e3q>licit reasons for 
rejecting the alternative (and competing) 
meanings. The science textbook writers 
would be equally justified in asserting that 
for them "pest" is a descriptive term applied 
to organisms that cause fatal diseases and 
epidemics. 



its importance in terms of Bronars' argu- 
ment. 

Bronars defines "pest" as "something which 
causes inconvenience to the one employing the 
term." Thus Bronars treats "pest" as an eval- 
uative term; the primary dictioneiry definition 
is descriptive. \ 



However Bronars responds : 

"While the primary dictionary 
definition of the term 'pest* is 
descriptive I wished to draw atten- 
tion to the evaluative one. I 
agree that I should have spelled 
out my reasons for doing so. In 
the same way, however, the text- 
book writers need to explain their 
use of the term. As the experi- 
ment is set forth the fly is not 
killed because he is a 'pest' (des- 
criptive) but because it is assumed 
that no one will object to its 
being used as a victim. There are 



ERIC 



BRONARS -7 



10. cont'd 

otJier attitudes towards flies, how-^ 
ever, as seen in some of the Scien- 
tific American articles on their 
life cycle. Here reference is made 
to their beauty and to other kinds 
of characteristics . " 

We might add to this by quoting Uncle 
Toby's reaction to flies, from Tristram 
Shandy : 

"-Go- says he, one day at 
dinner, to an overgrown fly which 
had buzzed about his nose, and tor^ 
mented him cruelly all dinner-time - 
and which, after infinite attempts, 
he had caught at last, as it flew 
by him; - I'll not hurt thee, says 
my Uncle Toby, rising from his chair ^ 
and going across the room with the 
fly in his hand, - I'll not hurt a 
hair of thy head: - Go, says he, 
^ lifting up the sash, and opening 
his hand as he spoke, to let it 
escape; - go poor devil, get thee 
gone, why should I hurt thee? - This 
world surely is wide enough to hold 
both thee and me." 

The difference, between evaluative 
and descriptive definitions of terros, isn't 
very important with regard to Bronars' paper. 
It is however generally an important point. 
Too often in educational research, where the 
value issues continually impinge on every 
significant problem, we find this slippage 
between a descriptive and evaluative defin- 
ition of some key term* The shift of mean- 
ings is often very subtle , and is something 
one should be constantly on guard to catch. 

Bronars writes, "Pain is ar philosophical 
concept, not a publicly observable phe- 
nomenon." (p. 277, paragraph 2). Give 
reasons for accepting or rejecting this 
statement. 




11. Bronars writes, "Pain is a 

philosophical concept, noc a 
publicly observ2dDle phenomenon, 
(p. 277, paragraph 2). Give ] 
reasons for accepting or reject 
ing this statement. 



BRONARS -8 



11. cont'd 

Here is one reason why we might want to Answer : 

reject the statement as it stands: The state- 
snent claims that pain is a philosophical con- 
cept. It seems to us that pain is no more a 
philosophical concept than it is a physical 
concept, or a medical conqcpt, or a concept 
of ordinary human experience. It is a feel- 
ing. There are many different contexts in 
which the term "pain" is used to refer to this 
feeling. However we might want to accept the 
general sense of the statement because we canv^ 
make a distinction between a concept (and its 
sign, such as a word, a gesture, a mark) and 
that to which the concept refers. Concepts 
which are relatively rich have attached to 
them a cluster of criteria (sets of meaning) 
which we use in correctly applying the term. 
There an important sense in which it is 
appropriate to say that we do not" see a con- 
cept." We can, however, reach agreement 
about what it is the concept refers to, i*e., 

m what is observed. Thus, in common medical 

^ practice, doctors reach agreement about pain, 

the threshold of tolerance for pain, the 
effectiveness of drugs and other treatments 
to reduce pain, and so on. 



12. Bronars writes: "All we can do is to 

state a value position and invite chil- 
dren to consider it. The teacher's 
right to compel children to accept it is 
a moral question,,." (p. 277, paragraph 
1) . Yet the tenor of her article sug- 
gests that "reverence for life" must be 
taught to children. Is she logically in- 
consistent? Why or why not? 

At first glance it might appear that she 
is being logically inconsistent. Bronars 
states as a fact (p. 277, paragraph 3) that 
there are a variety of feelings which children 
have about living things. Thus, presumably, 
some could have notaibly tougher ideas about 
living things than Bronars might wish. To 
suggest that "reverence for life*' i^ust be 
taught to these tough-minded children implies 
that the teacher needs to go beyond merely 
inviting them to consider this value. 



12. Bronars writes: "All we can 
do is to state a value posi- 
tion and invite children to 
consider it. The teacher's 
right to compel children to ac- 
cept it is a moral question..,.' 
(p, 277, paragraph 1), Yet the 
tenor of her article suggests 
that "reverence for life" must 
be taught to children. Is she 
logically inconsistent? Why 
or why not? 

Answer : 



DKONARS -9 



\ 

12. cont'd 

In fact, though, she is not being incon- 
sistent. To suggest that something must be 
taught in schools does not entail the sugges- 
tion that children must be compelled to accept 
it. 



13 • Bronars suggests that science study be 
focused on observation of living things 
in their natural haijitat. She also sug- 
gests that learning ^ith actual objects 
(i.e., living organisms) may not be as 
effective as learning with representa- 
tive materials. Is there a contradic- 
* tion in these two suggestions? 

Again, she is not being inconsistent. To 
explain why we might best quote her own res- 
ponse (personal communication) • "Reference 
is made to the kinds of science study that 
would best be carried on through the use of 
field observation techniques (p. 277) and 
that would best be carried on through the 
use of representative materials (p. 279) . 
There is no contradiction but rather a refer- 
ence to different kinds of phenomena," 



13. Bronars suggests that science 
study be focused on observa- 
tion of living things in their . 
natural habitat. She also 
suggests that learning with 
actual objects (i.e., living or- 
ganisms) may not be as effec- 
tive -s learning with represen- 
tative materials, is there con- 
tradiction in these two sugges- 
tions? 

Answer : 



14. ^ What is Bronars* main question about 
effects of the educational practices 
examined in her report? Briefly 
sketch how this question might be 
answered empirically. 

The main concern of the paper seems to 
be the relation between certain activities 
in elementary science practice and two rela- 
ted values: a) attitudes of children con- 
cerning reverence for life, and b) atti- 
tudes of children toward the balance of 
nature . 



14. What is Bronars' question about 
the effects of the educational 
practices examined in her report? 
Briefly sketch how this ques- 
tion might be answered empiri- 
cally. 

Answe r : 



An empirical study comparing these atti- 
tudes in children who both have and have not 
been exposed to the practices of elementary 
school science which are being questioned 
here might help determine the effects of 
these practices upon such attitudes. 



ERIC 



BRONARS -10 



14, cont'd 

However, it must be stressed that Bronars' 
arguments cannot be "validated" or "disproved" 
by any possible result of such an experiment. 
She wants to argue that classroom activities 
that involve the heedless and casual killing 
of living things are wrong in themselves. If 
\/e ran a test that discovered killing people 
did not seem to effect people's attitude to 
human life we could hardly claim to have 
shown that killing people is all right. 




Ap]>endix X 

Edv/in .1. bridges, iayne J. Doyle, ajid Daviu J. ..aiaai 



Effects of iiieraxx:lucal Differentiation on 
Group Producitivty, Efficiency, and Risk Talciji;^; 



Adrainistrativc Science Quarterly, Sc;3teiuber lOGi, |p« 505-31 ■) 



Lffccts of llierarcliical Differentiation on 
Group Pro<iuct J vity > hfficiency, and Risk Taking 

lidwin M. Bridges, IVa/ne J. i^oylc, and David J. Maiian 

Auniin ist r a tive Scien ce (^jartcrly , September 1963^ pp, 305-319 

SPhCIAL rOTI-^ 



In both tJie Results and Discussion sections, the investigator discusses the 
use of one-tailed statistical tests (as opposed to tw-tailed tests). These two 
tyjxis of statistical tests are frequently used to analyze the type of data pre- 
sented in this paper, ivhen a researcher tests the statistical significance of 
the difference of tlic mean scores for two groups, he calculates those differences 
(callo.1 rejection regions) w'nicii, if they occurred, would be so large as to cause 
hiTTi to reject the hypothesis of no difference in group means in the population — 
that is, to reject the hypotJicsis that the differences in ineans are due only to 
chance. If a researclier is willing to consider large observed differences as 
reason to reject this liypothesis of no difference regardless of which group had 
the higher mean, tiic researcher is conducting a tvvo-tailed test. (The rejection 
regions are at the two tails of a distribution of expected differences.) If, as 
the investigators of this paper liave done, only large differences in favor of a 
si>ecilic group wi.ll lead tiie researclier to reject the no difference hypothesis, 
tlien a one-tailed test is being conducted. Vthen a one-tailed test is used, find- 
ing a difference in favor of the group NOT expected to be superior will not per- 
mit the researcher to reject the cliajice alone hypothesis, no matter how large 
that unexpected difference is. 

There is debate awng statisticians over the apj^ropriateness of one- tailed 
tests. The point to keep in mind is that when one-tailed tests are used, smaller 
group differences are needed to reject the ch^ince alone hypothesis provided, of 
course, the differences are in the direction hyj^otliesized. This is true because 
one large rejection region is used ratlier than two smaller ones. Had tlie re- 
searchers used a two-tailed test, the differences in efficiency scores (hypothesis 2) 
and in risk taking scores (h)7X)t}iesis 3) would not have been statistically signifi- 
Ciirit at the .05 level. 



ERLC 



Lffccl:; of iiicrorchico J Dif f crcntict ior> on 
Crou|) rroOiicli vi . Liricicncy, t'^nd lUr.K Tcihiny 

/tdrnini otivc Sc ) cncc QvKirtcrl y, September 19G0, pp. 30D-319 

^'hc article it. diviclcc) jnto the follo'v/ir.g !:cctionc: Introduction, liypothc- 
/Method, }(f riilit., Di r.ev:3:,.%aon r^.ud Cone ] iki 3 r.'j rcrrnrJis. rv*jluate the 
article critically, orn.iiu nn'] your rcr arhs into :.ix nrcuj;;r, to corrccpcnd to the 
•nix i,cct ionr. c: ti.c pojer li:,tcd oLove, lie r.urc to cite L>trcncjth:i »ig \/c11 as 



ERLC 



iiffccts of Hierarchical Differentiation on 
Group Productivit)^, Hfficicncy and Risk Taking 



Ldwin M. Hridgcs, Wayne J. Doyle, and David J. Mahan 



Acteinistrativc S cience Quarterly , Septanbcr 1968, pp. 30S-319 

A MODi-X APPRAISAL 
Introducti on 

1. AlUiough brief, the introduction is a good one. It provides a clear idea 
of the content of the paper and makes a case for its significance. IVc believe 
the genera] problem is an iniportant one, particularly in the present tijnes of 
doul)t about authoritarian forms in Tn;uiy kinds of organizations from communes 
to private industry, fran educational institutions and classroom groups to 
bureaus of the govenuDont. Further, tlie criteria used for judging forms of 
organization (i.e., productivity, efficiency and risk taking) arc important ones. 

2. Student Kcsj^^ise. "A peer group is not necessarily undifferentiated. Peer 
groups TiolJ tlieir 'internal differentiation ;u)d the study does not take ttxis into 
consii^eration." 

3. Oiir Kej^ly> llie author does write, *'undif ferentiated groups, i.e., peer 
groups." We agree con:pletely with the student's rcmarl^ and make this ix>int 
ourselves in another context (see paragraph 15 of our model appraisal). 

Stucicnt Ji<^J'Wiscs. The Introduction is poor in that t))e investigators, 
"did not present a* review of the existing rer^^arch," and, "failed to define 
t}}c hicrardiically differentiated rmd undifferentiated groups." 

5. Our j^lj^^^y ' Althoui^)) u'c agree tliat it is iinportajit to reviCTv existing re- 
searclT^aiki to define key concci^ts, we do not believe that it is necessary to 
do these tiun^^s in tl)e Introduction. The authors do refer through footnotes 
to the work oi others in whicli the concept of hierarchical differentiation is 
described. ^ 

6. Student Ivcsjoiisc. "The tenninolo^;y was so involved that it uas difficult 
lo waJe tJ^roupjj." 

7. Our l^';^>ly. liany stiulcnts irvuic sirnilar stalerKnts, not only in regard to 
the Introvi.jct icui l.ut jn r("fcrc!ice lo ot)iei* scctiojis as well. The investigators 
do have an ol>]i.;at)on to cor^iiiunicnte cU\arly; I'lit we vust rcrienber tliis article 
is not iDcant lor consu/,.pt ion l^y the genera J pai^lic. The lajiguage of science 
cannot l)c Vwc sarsc as cvcrvday lan.^;ua):e for the latter is too ijnjnecise. On 
tile othrr hand, unnecessary jarj^on ca2i t)c confusing ajjJ some hiilance is needed. 



ERLC 



Bridges *2 



Hypotheses 

8* ThHj hypotheses section of this paper does not ^merely list the three prin- 

cipal hTpotheses which guided the investigators' early vork on the problem, 
but goes beyond to provide a helpful rationale for cxpocting the result hy- 
pothesised. 

9. The investigators should be conmended for the wcy in which they used social 
science concepts and theory to guide thiir research on administrative problems. 
This reliance on theory: a) increases the probability that relationships will 
be discovered; b) provides a \^ay to explain cuid to account for differences 
%#hen they do occur; and c) facilitates; additional inquiry. 

10. The investigators hypothesize (#3) that in differentiated groups the 
subordinates who generate ideas will hesitate proxtioting them, and thus fewer 
of these generated ideas will be presented by the group to the research worker 
(there will be low risk taking) • One could argue the opposite as follows : 
because of the greater inhibition in differentiated groups, subordinates will 
only suggest ideas which they feel can be defended; thus the ideas suggested 
in such a differentiated group are more likely to be accepted by the entire 
group for presentation to the research worker (there will be high risk taking). 

11. Student Responses . "Hypothecis 3 was baced on opinion," "The subjective 
statements used in the explanation of each hypothesis have not been proven." 

12. Our reply . These student readers evidently bolievo it not worthwhile to 
engage in research whose hypotheses are generated from rationales vrhich are 
"opinion" and "not. . .proven." The lino of reasoning behind hypotheses can range 
from radically speculative ideas and mf^re opinion to coherent rationales and 
logically tight theories. It nay well be troo that the payoff of research de- 
pends upon the location of the lin-^ of reasoning along this continuum. It is 
our judgment that the investigators utili::o n thoughtful (if not compelling) 
line of reasoning which is mnch iriore unr.n unr^ubstan^iatc J opinion. 



13. The investigators cl:o.:^ t . ur , an -^i ^.crin trl :.i*^hcl rather than a survey 

or correlational design even thou'^h in the field of educational administration 
the tradition of nonexperimental rcfroarcii is especially strong. A more usual 
procedure to study the effects of a variable like "hierarchical differentiation" 
%«nald be to administer an instnx-nent to first identify school groups %ihich differ 
naturally on this variable , and then to compare these groups with respect to the 
dependent variables. Our purpose is not to claim that the variable manipulating 
experiment conducted by the investigators iti superior to the more traditional 
status study (although we suspect it is) , but ri>tiior to highlight the fact that 
there is usually core than one w.iy in which problem c*^n be researched. 



ERLC 



Bridges ^3 



14. Sample: 

Ten groups, each consisting of a principal and three teachers, were clas- 
sified as hierarchically differentiated. Ten other groups from the same schools, 
each consisting of four teachers, were classified as hierarchically UNdiffer* 
entiated. Thus, groups were considered hierarchically differentiated or un- 
differentiated solely on the basis of whether or not the principal was present* 

15. Whether or not this distinction (hierarchically differentiated vs. un- 
differentiated) corresponds to the conceptual definition of **8tatus difference** 
ws unexanined, partly because no conceptual definition was provided. It is 
quite conceivable that something other than status** was being manipulated 

by the investigators, such as *'maleness** , "personal dominance**, '*dif ferential 
familiarity**, or ""emergent vs. appointed leadership.** Since status systems 
exist within teaching staffs, it is not certain that the all teacher groups, 
supposedly without status differences, really differed on this dimension from 
the groups in which the principal was present. The investigators would have 
been i#ell advised to check the correspondence between the operational and con- 
ceptual definitions, perhaps by means of a post-experiment questionnaire or 
interview, 

16. It should be noted that the main comparison was between pairs of groups 
selected from the same school. Thus, differences between groups could not be 
attributed to school differences since the croups were essentially matched 

in this regard. The investigators 8}u>uld be commended for insuring that the 
basic comparison between the two types of groups was valid, even though re- 
sults might not be generalixable to all types of school groups in all localities. 

17. Procedure s : 

Under the section^ Procedures, the researchers describe the problem to 
be solved (the doodlebug problem) and the methods of administration* The 
adequacy of this problem, the decision making procedure, and the role of the 
experimenter deserve comment at this point. 

18. Problem Adequacy . Cme should note the difference between the doodlebug 
problem presented to the groups and the range of real life problems to which 
such groups generally attend. Many of the educational problems faced by teachers 
have no clear answer as does the doodlebug problem and we may therefore 
question whether results obtained using this special problem can be made more 
generally applicable. Closer inspection of the measures generated from the 
doodlebug problem will reveal that the problem is used to measure the ability 

to overcome normal beliefs rather than to measure problem solving ability in 
the usual sense. The doodlebug problem is more lihe a puzzle than a problem 
in decision making. 



Bridges -4 



19. Further, the doodlebug problem was too difficult for use in testing 
differences in productivity in the synthesis phase of the task. A pilot 
study could have shown this fact. ^|^JJCJ^ ^3 . t(oS) 

20. Finally, mention of the three beliefs to be overcooie (in paragraph 1, p. 310) 
well before describing them (in footnote 8) is weakness of reporting style. 

21. Although the task was, in a sense, artificial and trivial, it does have the 
virtues of hAving been thoroughly studied in previous research, and is of such 

"""a^nature that principals should be equally adept as teachers at solving it. 
Ttiis ^ act point is important, for if the problem were something tha^t principals 
coulc be expected to handle more easily than teachers, the group differences 
could be attributed to the particular skills of principals rather than to the 
hierarchical differentiation of the group. 

22. Although the choice of a suitable problem was a difficult one, we believe 
the researchers should have chosen one or more tasks more closely related to 
actual school situations. CftC^C p - 

23. Decision making procedures . For purposes of reaching decisions within 
each group involved in the problem solving situation, a parliamentarian arrange- 
ment in which the majority rules was decided upon. This is an unusual method 

for school personnel to use for reaching decisions. More likely is the centralist 

constitutional arrangement in which a group is bound by a decision reached 

by the person in final authority. As rrcoqnized by the investigators themselves 

(see footnote 14), use of a majority-rule procedure makes it difficult to 

explain the results. It is when the centralist arrangement is used that status 

hierarchies in groups are expected to matter most because this form recognizes 

and utilizes status differences in its operation. Thus, not only is the general- 

izability of the study weakened by use of an atypical decision making procedure, 

but the very rationale for expecting status differences to be operating is 

less applicable to the parliamentarian arrangement and, consequently, interpreting 

differences to status differences in the groups is very hazardous indeed. 

24. Experimenter role . One weakness of the report is its failure to describe 
clearly or completely the role of the experimenter during the problem solving 
sessions. It is nowhere indicated how many experimenters were used er the extent 
to %^ich they had been trained for participation in the sessions. Thi* last 
sentence in footnote 13 mentions that the experimenter "^clarif ied** ideas. 
Elseidiere it was stated that the research worker gave immediate feedback 

(p. 308) and could be asked questions (p. 309). All this suggests that the 
experijaenters may have had a more active role in the problean solving sessions 
than we might believe. It is important for us to know the extact nature of the 
experimenters* role more accurately to assess possible experimenter bias (or 
more generally, instrumentation** effects) and the additional restraints that 
may have been operating on the behavior of the participants. 



ERLC 



Kolc fron iho invr. ! I ral or : Actunlly, vc dlU coiiduct a pfjol j;tudy, niu! thirty 
minutes ajipcnrcd to he ^itnpjc tine for .solving, thr problem. In r.nckinp, to Identify a 
po.nrJblc cnusc for ihc unexpected outconc, vc renlfzcd thnt v/c hnd bluncU-rcd. The 
popiilfitions of subjects wrrc different. Uljcrcns the cxncrinicntnl subject.'; vcrc 
tc.ichcrs, the subjects In Jhc pilot study vcrc sophomores in the llb:»r*nl arts collej;c 
of a highly selective university. TJic inplicatiou of tliis situation is unmistakable; 
the teachers and orlncipals in our sample t^crc not as able as the collcj;c sophomores. 
Wc chose to safeguard the Interests (namely the self-esteem) of our subjects by 
withliolding potentially hnrnful information. We certainly arc not the first resear- 
chers i/ho have v;restled vith the choice of what to report and what to v/ithtiold; this 
decision frequently arises v:hcn the objects of research are human heinr,s« 

rarar,raph 22 

Wotc froD the inver>t L^ntors : V.'c share ^hc reviewers' belief tl.at tasks norc closely 
related to actual scfiool situations should liave been used, but feel that they have 
slighted a persistent dllcnma faced by those ciioosinr, an expcrlnental approach to 
research. Aii expcr Incntcr hopes to dcsi^.n a study vhich has both inte rnal and cx- \ 
tcrnal validity. A study is snid to possess internal validity if the cxperincnfial 
Btiniulus did in fact make sonc significant difference in this specific instance. Ex- 
ternal validity refers to representativeness or Bcncralf xabi 1 i ty , As Donald T. 
Campbell^" ractors r.elrvant to the Vnlidiiv of Cxperincnts In Social Settinrc," 
P_sycliol o,^.icnl nullctin, bU (1957), 297-3?j), has noted, 

Both critorin nrc oSvJou.slv i-.iportnnt nUhour;h it turn-, out tl\nt ihcv nrc to 
son,o txttat i iicov.pai I b Ic . in that the controJii rrnuiicd for internal validity often 
tend to jroT>.,rdi.ie i c-.rt-.enta: I voncsr; . . . I f one is in n cltt.ntion x.'lu-rc citlicr intcrn.il 
validity or rc;itoscnt,Ttiveno:;.-. ",ust be sr.crificcd, vhich ';hould it be? 'Jlic nnsuvr is 
Clc.nr, JnLernol v.iJidity is tiic iirlor :n.'.! indispensable conr. i de rnt i on . 

lu celrctinr, tlic prohlem, vc ;;ouf,ht to identify one In t/hich neither princJp.iLs nor 

tc.ichrr:. \,-Puld h.ive .nn advnntnre. \:e vcrc not confident tlmt vc could develop n 

''-iit''- (fil. . ^:.jt:i (onKl \.c h.mdled M h c-,,-,) cu-.c oi (i 1 f f i <■ ] t v hy nrir- 

cip,1> nnd te.ichei.. V.r , thciefnre, r.nc r I f i r e,/ <-xte n -,,11 v.iUdity In tlie Intcrcr.ls ot 



Bridges -5 



25. Finally, note ttv%t it is not made clear how the solutions of the groups 
were "passed on** to the experimenter. Did the administrator, when present, 
have any special function in the passing on activity? 

26. Student Responses . Many students mentioned the failure of the investi- 
gators to give adequate description of the following areas: a) the doodlebug 
problem b) method for selecti7>g the principals; c) method for selecting the 
teachers; specifically if they were volunteers and why there %rere so many 
fenales; d) teacher experience and age; e} effect taping of the sessions had 
oi^ inhibition; f) fatigue of those meeting in the afternoon sessions; and 

g) procedures, if any, for checking whether the morning session teachers 
talked to their afternoon session colleagues* 

27. Our Reply , a) In our opinion, ti::o doodlebug problem was adequately des* 
cribed both on page 310 and in footnote 8. Further, an accessible reference 
where a still more complete description can be found is provided. 

28. b) through e) . Printing costs are high and there are more papers than 
scholars have time to read. There facts argue for a jxidicious choice of those 
facts and details to be described in the research report itself. Clearly, 
that information which has the most bearing on the validity of the comparison 
between the two groups and on the generaliz£dbility of the findings should 

be included. For example, the investigators thought it more important to mention 
the name of the city than the ages of the teachers. A student argued that if 
the teachers differed in age they could not be considered ""peers** regardless of 
vfhere they were on the "organizational chart." Previous research can give us 
clues Jibout what variables are likely to be important and thus v^rthy of des- 
cription in the research report. 

29. f) and g) Since treatments were randomly assigned to session times, and 
since small differences between sessions, on the dependent variables were noted, 
it does not seem important to us that the fatigue and prior knowledge differences 
of the tw groups be described. 

30. Student Responses . "Several problems of varying types should have been 
used.** 

"A larger cross-section of the population should be used, and not just 
teaching personnel." 

31. Our Reply . These investigators wanted to make vory general statements 
about groupHstructure and group problra solving. It is essential that they 
design their research in a way that enhances the generalisability of their 
findings. One way they increased the generalizability of their work %#a8 by 
inclining several measures of problem solving ability. Had they not given 

the same problem to all 20 groups and had they used other types of hierarchically 
differentiated groups (the two student suggestions quoted above) their study 



ERLC 



Bridges -*6 



%ould have had that much more value. i;e do not believe that most researchers 
give enough thought or effort to designing studies to maximize general isability. 
Ways thia can be done without increasing the cost of the research are described 
by Mlllman.* A list of vrays that research can be said to generalize is pre- 
sented by Bracht and Glass.** 



Results 

32. The Results section includes a description of the measures used to repre- 
sent the dependent variables of production, efficiency and risk taking, as veil 
as a statistical comp£u:i8on between the two types of groups on these measures. 

33. The Measures . 

Production . Using the number of beliefs overcome as a measure of produc- 
tion seems reasonable enough, although one could argue that the three beliefs 
should not be given equal weight. 

/' 

34. Efficiency . Time to overcome the first belief is a good measure to test 
hypothesis two since it is in the early stages of group work that relative 
differences in speed of performance are expected. According to the investiga** 
tors' predictions, developing the pattern of interpeirso/ial relationships 
needed for efficient problem solving, '*will require more time in hlereurchlcally 
differentiated groups than in undifferentiated groups.** (p. 308) This time 
consuming process will produce a difference In efficiency more evident in the 
beginning of the problem solving situation than at the end. 

35. lt\e distribution of ti>ae to overcome the first belief is likely to be 
skewed, with a few groups taking relatively a very long time. Such groups will 
hf.ve a disproportionate effect on the mean of all 10 or 20 groups. Further, 
fjne could argue that taking an extra minute of time early in the problem solving 
effort should coxuit more than an extra minute after the group already has worked 
15 or 20 minutes* For boti-i of these reasons, it would have been a good Idea 

to use as the Index of efficiency not time per se but some function of the time 
score such as the reciprocal of time (i.e., one divided by the time score) or 
logarithm of time. Such functions have the desired properties. 



In the Service of Generalization, Psychology in the Schools , 1966, 3, 333-339. 

The External Validity of E^perinents, TUnerican Eduf;ational Research Journal, 
1968, 5, 437-474. 



ERLC 



Bridges -7 



36 • Risk taking . The risk taking med&ure used by the investigators is the 

difference between the number of generated solutions and the nxunber presented 
to the experimenter. A large difference actxially means low risk taking because 
the group seems unwilling to ''risk*' presenting solutions to the experimenter. 

37. TO name such a measure '*risk taking" implies there is something to be lost 

in suggesting inaccurate solutions to the experimenter and \:hat something 
is being risked in presenting other than the correct answer to the problem. Since 
the groups were in no way penalized for presenting such incorrect answers # 
whai; risk is involved to the group is not clear. The individual is said to risk 
"failure in the eyes of his superior." But this fear of failure of the indi- 
vidual is not reflected in the group difference score which is used as the risk 
taking index. Thus# we do not believe this group risk taking indr^ is a measure 
of risk taking in the usual sense, or in the sense used in organizational 
theories, but more a measure of how reasonable the suggested solutions seemed 
to the group involved. 

36. The definition of risk taking given by the investigators on page 312 

is a stipulated definition and not an operational definition. To be an opera- 
tional definition, the operations or procedures that must be followed to get the 
discrepancy inde% are needed. Of course, a researcher may give a stipulated 
definition of his operational definition; not all stipulated definitions are 
operational definitions. 

39. Statistical Analysis . 

The stvident should note that although 80 individuals were involved, the 
investigators correctly compared only the 20 group results. The group, and not 
the individual, is indeed the correct unit for analysis* 

40. The likely skewness of the distribution of the "efficiency" measures has 
already been cofmnented upon. The "productivity" measure also represents a 
SKewed distribution since most of the groups must have overcome all three beliefs 
in order for the mean scores to be so close to the maximum score of three • 

Thus, as was true for the efficiency measure, a few groups which could not get 
off the ground, so to speak, would have a disproportionate effect on the mean 
productivity score for all ten groups. The investigators should have presented 
more of the groups* performance than merely the means.* 



* Further, because of non-normal distributions and likely large differences 
in variability between the two types of groups, the mathematical assumptions 
of normality and homogeneity of variaxice underlying the proper use of the t 
test are being violated in the testimj of hypotheses 1 and 2. The effect of 
these violations on the accuracy of the significance test may be quite minimal, 
however. 



-7a- 

« 

Add aflcr footuolc p;ir,c 7 

Kotc from llic Itivf^ t Ir.iiorr; : \A\c\\ the nssunpL Ions conntitnl Inr, the .statistical 
model for a tont arc not mot, doubt nrlscr, concernlnj; the ncnntnr.f uJness of a 
probability r.tjtcwcnt P.l)out the hypotluicln in question. There is .some empirical 
evidence to '^how that nl3r,ht deviations from the assumptions underlying parametric 
tests may not have radical effect:; on the obtained probability f.ir,urc (Sidney 
Sicfjcl^ Nonna rar.:etr i c Slatistics for the Behavioral Sc i ences ^ New York; Mc~Crav;- 
lli),l Book Conoany, Inc., 1956) and that major effects arc likely to occur only 
vhcn the sample is snail (I.'illiam L. Hays^ S tati s tics for P sv cho3 ot^istr* . Nev Yorl 
Holt, Rinchart and Winston, 1963,) UTiat constitutes a slip,ht deviation or a small 
sanplc is unclear, however. In light of the confused picture and to satisfy our o\ 
curiosity, ve analyzed data by means of the t^test and the Mann-l.^itncy U test, a 
non-rp1)rametric statistic. The results vcre identical. As tlie revievrers noted, th^ 
effects may indeed be quite mininal* 



ERLC 



-8- 



41, Since the groups were matched by scliools, tlie appropriate t_ test involves 
ccmparin}} 10 inatchcJ p.'iirs instead of two independent sets of 10 groups each. 
A different fonniila for computinj; the t^ statistic should have been used.* 

42, We also take exception to the use of one-tailed tests. (Recall the special 
notes in regard to one-tailed tests.) The use of one-tailed tests is most de- 
fensible when tliere is no reasonable way to exT^lain results in favor of the hier- 
archically differentiated groups. For example, contrary to hypothesis 5, it might 
be that in hierarchically differentiated groups, generated solutions are more apt 
to be presented (i.e., greater risk taking exliibitcd) because subordinates would 
not want to offend their peers in front of tlie principal. Had two-tailed tests been 
used instead of one-tailed tests, the first two hypotheses in the paper would not 
have been statistically significant. (Qii 

\ ' 

43* Regardless of the t_ forjiiula used or liov; many tailed tests were employed, 
the following interpretations sean reasonable: for each of the three dependent 
variables there were noticeable differences between the average performance of 
the groups of each type; it appears unlikely, but still possible, that chance 
alone accounts for these differences. 

44. In the last two paragraphs on page 312, the authors perform nvo additional 
analyses. Tliey test whether there is a difference on the dependent variables 
between the before-school groups and af ter-SGl^ool groups, and they compute the 
correlations among the dependent variables. Had these differences or correlations 
been large, it would liave. suggested modifications in the interi^retations of their 
results. The investigators should be comended for taking these precautions and 
for searching for rival explanations. 

45. Student Responses. "I leally do not have enough background in statistics 
to evaluate this section well." "U'e have not covered this kind of statistics in 
class." *This section (due to my complete density in the area of knowledge of 
statistics) is impossible for me to comment on as it was all foreign to me." 

46. Our Rep ly. Of course the kind of discussion we gave in some of the para- 
graphs 39-44 of tlie model appraisal does req^iire a statistical sophistication. 
However, do not be led into tliinking that because you lack this sophistication 
you cannot look at the results of studies critically. The \>rriting of paragraphs 
53-38 did not require this sophistication. Without statistical expertise you 
can still question wJieth^r tlie data presented are relevant to tlie questions ' 
asked. Don't give up too quickly. \ 

*The different formula would have 9 degrees of freedom instead of the 18 re- 
ported by tlie investigators. If, on the average, the two groups from the same 
school were more alike in tlieir problem solving behavior than differentiated 
and undi fferent in tet.1 groups from different schools (as we suspect them to be), 
then a higlier value of t would result. From the data available to us, we suspect 
that had the investigators used the t formula for matched pairs the results would 
have been even more statistically significant. 



ERLC 



rnrnr,r;tpli / • 

Note fron lUc i iwcMlr.Ttor?; : There arc lliosc v;l)o, lihc us, feel thnt a oiic-toilcd 
tc«t cnn lie used wiien there in a thcorcllcnl hacis for n directional hypothcrJs 
(Allen L. Edward:;, Stnt i<;t Ir;^^^ tlic Rchnv^oral Scicncrs , K'cwYork: 

Rlnchart and Company, Inc., J953 ); there arc othcrn, however, who feel tliat the 
potential for nlraKsinp n directional hypothesis in substantial (Cenc V. Glass and 
Julian Stanley^ Statlf^tical Methods in Educat ion and r;;ycholony , Enf^.lov/ood 
Cliffs, Kcv; Jersey: Prenticc-llall , Inc., 1970.) The only sCatencnt which can be 
nadc with certainty is that a dchatc over the merits of testing directional versus 

nontllrcctlonnl hypciJicics hns r.Tr.cd for llic past tv;cnty ycirn (cr,., see CJctus J. 
Curkc^ "A Brier KoLc on Oac-Tnilcd Tests," r.vi;cliolor,i c.T 1 liu] Ic I in . 50 (1933) 3S4 
87; and Uavid li. I'cizcr. "A N'otc on UircctJonal Infci-encc," j^svcho] oqicn l Hu'llct in 
68 (1967), A/IS), 



ERIC 



Siudcnt jv^^ f >onr.c > ••The rcz^ultr. iliiln*L allow for diCfcrcnt ihtcllifjcncc 
or pcr&on%ilitic:; iiKliviUu«iIu.'* 

• • ' 

Our l <rply > Hio stmleilt cuuld mean t%/o things by her statements First* 
flic could mean t];at t!ac procedure:* did rot cc;uaie <jrou|is on intclli^^fcncc or 
personal ity« To tiiat vc vx>uld reply tliat the ramlon assi9iiincnt teachers 
to Qroups has the effect that :;uch initial (jroup differences in intelligence 
4>r personality v.ould Le due to dunce alcne and they can be estiinated by tcchniqu" 
of statistical inference^ Alternately^ the studenib could pean that the results 
did n6t provide separate analyses for individually of different intelligence 
or personality. To that \yo would reply tl;.it such an analysis \/ould liavc to be 
for the group as a v/holc (criterion scores arc for the groups* not individuals 
vithin the grcups)\ The siaall number of groups (10 within each trcatcnent) would 
,icdho such as analysis of linitcd value« 

• \ 

Discussion * . 



The discussion, perhaps nisnatned, consists of the investigators* atterapcs 
to provide evidence relevant to three rival hypotheses: 1) the loucr proportion 
of solut.lons presented to tlie cxperlncnter in the hierarchically differentiated 
groups v/as due to the tcr.dency of ideas advanced by lo:/ ranking Dcsibers to be 
passed over rather than to a reluctance on the part o^ subordinates to tahc rlski 
(pages 313«-31S« first tv.*o lines); 2), a reluctance of subordinates to criticize 
the ideas of supercrdinates and/or an ^ncven distribution of social support 
%?a0 the reason for greater productivity in the ur^iif fcrientiated groups (p. 315* , 
317); and 3) tl:e curtailment of compctJdtion for respect in the differentiated - 
groups v;as responsible for the differences in productivity betveen^the ttvo tyix^s 
of groups. 

Some of our objections to v.»hat Is v;ritten in the Discussion section parallc 
reniar);s nade in connection v.»ith our appraisal of the Results section. Our 
displeasure v?ith the risk taking measure rerains. See * below for the remaindei 
of this paragraph. ' 

Perhaps nost disconcerting is the investigators* belief tliat thn nunbcr of 
ideas initiated as a neasurc of the degree to whicVi grovtp energies are mobiliret 
is a serious test of the conipotiticn for respect explanation. (V!e v.'onder why v 
the investigators arc so v/illing to accept lilau and Scott's third cxplaiiatory 
factor after they rejected the first tv»o.) 



^ Tlie t test should have fitade use of the fact that the t^hools were matched* 
Tlic chi-5;quare testrare inappropriate since the rer.jy>nr*cn of the same person 
arc represented by more tlun one frequency in the table and tlius the intlcpcndcr 
Assumption underlying the proper use of tlic chi-*s<|unrc test vas violated. 



priacipa aa an obaanrar. He vlU hava a car tain of fact on tha aituation." 

S3. ftir K^Va RtcaXl that thm pdrposa of th« additional atudy tlta to da* 
tamina if *...tha lo%rar psoportion of' aolotiona praaantad to^tba axparlMotar 
in tim hiaraxc:hlcaUy dif f arantiatad ^roupa ^aia not doa to a ralootanca fay 
anfaordinataa to taka riaka, but rathar to tha tandancy of idaaa advanofd fay 
lo«9»ntiking group Mnbara to ba ovaitlbokad." (pa 313) datamina ^AdOi 
of tbaaa la nora likaly it ^a nacaaaary, aa the inraati^tora did, to daai^a 
a aituation in >4iicb tha aana raluctanoa by aubcxdiibataa to taka ritfca vaa 
poaaibla (i.a. principal praaant) bat ona in Arhich tha principal hai no ehanca 
to ovarlook aubordinataa* idaaa (i.e., praaant biit no activa rola)^ 

s 

Concluding Raaarka 

54 a lha phraaa, "tand to confirm** # in tha firat aantanca under thi^ Conclciding 

Banarka aaction ia too atrong. *'Confi<»** auggaats that tha avfdafea ia noir 
aofficiant to warrant acceptance of the concluaion. I7e do not belirve^the 



llinvaatigatora aMMUit to give auch aaaurance. 



<55 



lie c o aai an d the inveatigatora for mentioning vaya in which tha reaaarch ia 
atill inooeipleta (e.g.# they did not investigate centraliat conatitatlonal 
arranganent or problem aolving at the ayntheais phaae) and for pointing to 
needed reaearch on the topic. 

A Suimnary of Our Aaaessment 

56. the problen the inveatigatora aet out to atudy ia an important one and 

their atudy providea a good illuatration of tha cloae an^ aenaitiva intagra*- 
tion of theory and data. We see the choice of the doodlebug ptcblm aa an 
unfortunate <ma and further object that the reaearchera offer no evidence 
that they have aucceaafully manipulated the hierarchical differentiation variable. 
Hie inveatigatora did take ^paina not only to teat their predict lone but aiao 
to examine the aaauiBqptiona "upon vhich their predictiona were baaed. We believe 
that the inveatigcLtora %#ent about their research buaineaa in order to protect 
thamaalvea from Improper inference and not juat to convince other that ttmy 
had conducted their atudy properlyitr 



ERIC 



